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1 Introduction 


As the integration of hardware and software continues to evolve, production 
systems are becoming increasingly intricate, now often referred to as Cyber- 
Physical Production Systems (CPPS). Particularly, Artificial Intelligence (AI) 
can be instrumental in improving processes such as anomaly detection, op- 
timization, or predictive maintenance. However, at the moment, incorporat- 
ing AI algorithms into these systems is far from straightforward; it demands 
substantial time, financial resources, and expertise. Adopting a standardized 
architecture could facilitate the integration of AI technologies, especially for 
small and medium-sized enterprises, empowering them to remain competitive. 
For this reason the Cognitive Architecture for Artificial Intelligence (CAAT) 
was introduced in [1] as cognitive architecture for AI in CPPS. The goal of 
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the system is to reduce the implementation effort by creating a standard ar- 
chitecture. The core of the CAAI is a cognitive module that processes the 
user’s declarative goals, selects suitable models and algorithms, and creates a 
configuration for the execution of a processing pipeline on a big data platform. 
During the revision of the CAAI project, it became obvious that there are 
existing limitations in automating the algorithm pipeline development process 
for all environments and use cases. This is mainly due to rapidly changing 
software interfaces and transfer complexities. Additionally, it became apparent 
that the architecture in the use case for CPPS provides a good environment for 
the implementation of online machine learning (OML) algorithms. This can be 
explained by the continuous data streams produced by the system’s machines. 
This abstract assesses the potential advantages of OML algorithms through an 
analysis of a real-world application in slitting machines. 


Section 2 roughly describes the concept of OML. Furthermore, it includes a 
description of the experimental setup, including its real-world application. In 
Section 3 the results of the experiment are discussed. Finally, a short conclu- 
sion is presented at the end of the abstract. 


2 Materials and Methods 


2.1 Online Machine Learning 


The amount of data generated from various sources has increased enormously 
in recent years ("Big Data"). Technological advances have enabled the con- 
tinuous collection of data. Web, social media, share prices, search queries, 
but also sensors of modern machines produce continious streams of data. It 
becomes more and more challenging to store and process these infinite streams. 
Traditional batch machine learning is the common strategy to train machine 
learning models. It basically boils down to the following steps[2]: 


1. Loading and pre-processing the train data; 
2. Fitting a model to the data; 


3. Calculating the performance of the model on the test data; 
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In modern OML approaches, we do not train the model with the entire dataset 
at once; instead, we update it incrementally with the newly arriving data. This 
way, we avoid storing large amounts of data by discarding it after the updating 
process. This procedure might be beneficial in terms of time and memory 
consumption. Additionally, this continuous updating allows OML methods to 
better address structural changes within the data, known as concept drift.The 
introduction of different methods of incremental learning has been quite slow 
over the years, but the situation is changing at the moment [3] [4] [5]. 


To compare OML with the classical batch learning method in our experiments, 
a Hoeffding Tree regressor (HTR) from the Python ’river’ library [3] is used 
for online learning, while a Decision Tree regressor from the ’scikit-learn’ [8] 
package is used for the classical approaches. In online machine learning, 
Hoeffding trees are preferred because they do not rely on previously used 
instances, instead they await the arrival of new instances [7]. Given their 
incremental learning capacity, Hoeffding trees are more adept at handling a 
data streaming context compared to traditional methods. 


2.2 Experiment Setup: Slitting Machines 


In the experiments discussed in this work, data was collected using a test setup 
for winding stations from "Kampf Schneid- und Wickeltechnik GmbH & Co. 
KG", a company that specializes in building machines for slitting and winding 
web-shaped materials. The idea of the experiment is to use the motor torque 
and revolution values of a slitting machine to predict the vibration level. All 
features are available in form of time series with with measuring intervals of 
10 ms. 


For our experiment, we divide the data into a training and a test set. The goal 
is to compare the prediction performance over an evaluation horizon that is 
subdivided into segments of 150 data points. Additionally, we analyze the 
calculation time and memory consumption during the evaluation. To com- 
pare OML with classical approaches, and to assess their respective strengths 
and weaknesses, we utilize four different approaches in our experiment. For 
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all approaches, we train the initial model based on the training set before 
we start with the evaluation process. Three of the algorithms belong to the 
group of batch machine learning techniques. The first is the classical batch 
learning approach, where the model is trained only once on the training set 
and subsequently evaluated on the test set. Another method is the landmark 
approach, where we add the data from the current horizon to the training set 
after each evaluation step, and then train the model from scratch. The final 
batch learning method is the shifting window approach. Unlike the landmark 
approach, the algorithm is not trained on the entire set of observed data, but 
rather on a moving window of data. In our case, this window is the size of 
the initial training set, meaning we add 150 data points with each evaluation 
step and remove the earliest 150 data points. The last evaluation technique 
is the pure OML approach where we update the model incrementally after 
each prediction step with the new data. The implementations of all evaluation 
strategies can be found under the following link: https: //github.com/ 


sequential-parameter-optimization/spotRiver. 


3 Results 


In the following part, we compare the different approaches as introduced in 
Section 2.2. 


The performance of the different approaches is visualized in the top graph 
of Figure 1. It shows how the MAE evolves over the evaluation horizon. 
All batch learning evaluation methods produce comparable results. Initially, 
performance degrades slightly and then improves continuously. The OML 
approach comparatively achieves constant results and outperforms the batch 
evaluations over the entire horizon. 


The second diagram in Figure 1 shows a comparison of the computation times 
of the different methods. As assumed, the landmark and shifting-window 
methods show a continuous increase in computation time due to the models 
need to be retrained at each evaluation iteration. In contrast, the conventional 
batch learning approach exhibits a much lower processing time because of 
its singular model training phase. On the other hand, the OML algorithm 
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Figure 1: The MAE, computation time, and memory consumption of different approaches for each 
evaluation step. The MAE plot shows an overlap of the graphs of all bml methods. In 
the plots for computation time and memory consumption, the curves of landmark and 
shifting window, as well as the OML and the classical batch method overlap. 


achieves time efficient results as well. This is because OML updates models 
incrementally, rather than training from scratch with each evaluation. 


The lowest graph of Figure 1 shows the memory consumption. Here, the 
OML approach also delivers comparable results to the basic batch approach. 
However, it should be emphasized again that the batch approach’s memory 
consumption only takes place during the training step, and the remaining con- 
sumption is negligible. This fact is also visualized by the graph. In the first 
evaluation step, the memory consumption for the classic batch method drops 
towards zero. The shifting window and the landmark approach perform com- 
parably poorly. This is mainly due to the generated model, which must be built 
again in each iteration. 


Furthermore, the assertions regarding the superior performance of the OML 
algorithms have been statistically substantiated by a one-sided t-test. This 
test demonstrates that the average deviation of predictions made by the OML 
algorithm from the actual values is significantly smaller than that observed in 
predictions from all batch learning approaches. 


Proc. 33. Workshop Computational Intelligence, Berlin, 23.-24.11.2023 5 


4 Conclusion and Discussion 


The OML algorithms outperformed the classical approaches not only in terms 
of memory consumption and computation time, but they also achieved signif- 
icantly better results in terms of prediction accuracy. This can be attributed to 
the algorithms’ enhanced responsiveness to concept drift, a decisive advantage 
especially in the domain of production machinery. Further improvements to the 
results presented here could be realized through additional experiments, partic- 
ularly with surrogate model based optimization of the hyperparameters. This 
evidence suggests that OML algorithms should undoubtedly be considered in 
the development of CAAI for CPPS. 
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1 Introduction 


Traditional machine learning paradigms depend on the availability of labeled 
data, a luxury that is not often the reality in real-world scenarios. In do- 
mains such as industry, healthcare, autonomous systems and finances a massive 
amount of unlabeled data is produced every day. As the demand for accurate 
and robust models to deal with this data grows, the inefficiency and the cost of 
manual labeling motivates the research field active learning [1]. 


Active learning describes an efficient and effective way of selecting the most 
valuable data samples for labeling. The value of a sample is defined by a crite- 
rion which is unique to each active learning strategy. This selection reduces the 
amount of manual labeling, optimizes resource allocation, and enhances model 
performance in real-world scenarios. This makes active learning a pivotal tool 
for domains where labeled data is scarce or costly. A common criterion for ac- 
tive learning method is the informativeness of a data sample. This is measured 
by the uncertainty of a Machine Learning (ML) model in its prediction [2]. 
The uncertainty estimation is dependent on the ML-model, which motivates 
this work to explore the quality of different ML-models and their uncertainty 
estimation methods for active learning. Additionally, the computational effort 
of different uncertainty estimators is explored, because there often is a trade- 
off between the accuracy and the computational effort of an uncertainty esti- 
mation method. For instance, probabilistic techniques such as Fully Bayesian 
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Gaussian Processes (FB-GPs) and Bayesian Neural Networks provide accurate 
uncertainty estimates at the cost of computational complexity due to parameter 
sampling from an intractable distribution, commonly done with Markov Chain 
Monte Carlo (MCMC) Sampling [3]. Neural Networks and their softmax-layer 
probabilities, on the other hand, provide less accurate and often overconfident 
uncertainty estimates [4] but add none additional computational effort. 


2 Related Work 


Active learning describes the process of selecting unlabeled data samples. An 
active learner is characterized by a predefined budget, representing the quantity 
of data samples it can actively select, and a criterion quantifying the value 
of a given data sample. The active learner chooses the most valuable data 
samples, measured by the criterion, and requests labels for these selections. 
The criterion typically relies on either the spatial distribution of data samples 
[5] to maximize the training set diversity or utilizes uncertainty estimates [2] 
to minimize regions in the input-space characterized by high predictive un- 
certainty. Leveraging uncertainty estimates requires the use of a ML-model 
capable of quantifying predictive uncertainty. By iteratively selecting new 
samples, the active learner enhances its performance, often achieving better 
results with fewer labeled examples compared to traditional passive learning 
methods. 


The research field of active learning is divided into three subfields [6]: Pool- 
based Sampling is the most common subfield of active learning. It involves 
selecting instances for labeling from a fixed pool of unlabeled data. The al- 
gorithm ranks instances within this pool based on its selection criterion. The 
selected instances are then labeled and added to the training set. Membership 
Query Synthesis involves generating label-queries synthetically based on its 
current knowledge, instead of selecting instances from an existing dataset. 
These queries are designed to be informative and help the model to learn 
more effectively. Stream-based Selective Sampling focuses on scenarios where 
data arrives in a continuous stream, and labeling resources are limited. In this 
subfield, the active learning algorithm processes data instances one by one as 
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they arrive. It decides on-the-fly whether to label the current instance or wait 
for a more valuable one based on the selection criterion. This work focuses on 
the pool-based sampling approach to active learning. 


The concept of uncertainty, which can be used as an active learning criterion, 
is commonly divided into two parts: Epistemic uncertainty refers to a sys- 
tematic uncertainty and results from incomplete or missing knowledge. This 
uncertainty is caused by factors like small data sets and other influences arising 
from an incomplete and potentially faulty data source, as well as incomplete 
knowledge about the process being modeled. Aleatoric uncertainty represents 
the part of uncertainty that cannot be reduced, including statistical relationships 
such as noise or fundamentally random connections in the data [7]. When 
using uncertainty as a criterion for active learning, especially the epistemic 
uncertainty is of interest, because data samples with high epistemic uncertainty 
carry the information that can increase the performance of the model [8]. 


2.1 Gaussian Processes 


One machine learning method that provides an uncertainty estimate is a Gaus- 
sian Process (GP). GPs are a good model choice for active learning because 
they are well-suited for smaller datasets, a crucial characteristic due to the lim- 
ited amount of training samples during the active learning process. However, 
there are two issues associated with GPs, when used for active learning. Firstly, 
they rely on proper hyperparameter selection, which poses a challenge as the 
costly labeling of data samples makes it difficult to justify withholding samples 
for testing and hyperparameter tuning. Consequently, hyperparameters must 
often be chosen based on heuristics or expert knowledge, which may not be 
available or applicable in all cases. Secondly, GPs’ uncertainty estimate do not 
differentiate between epistemic and aleatoric uncertainty. 


These issues have motivated Riis et al. [2] to explore the use of FB-GPs for 
active learning. The concept involves sampling the hyperparameters, com- 
monly noise and lengthscale, from a posterior distribution conditioned on the 
training samples. This directly addresses the first issue and enables the creation 
of an ensemble of GPs. Combined with the law of total variance V(y|x) = 
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V(E|y|x]) + E[V (y|x)], this approach allows for the decomposition of total 
uncertainty into epistemic V (E[y|x]) and aleatoric uncertainty E[V (y|x)]. This 
results in a significantly more accurate and useful uncertainty estimate for 
active learning. 


2.2 Random Forests 


Random Forests are a widely used ensemble learning technique, that uses mul- 
tiple decision trees to make accurate and robust predictions. They are able 
to model diverse datasets without requiring extensive hyperparameter tuning 
[9]. Additionally, [10] shows that they are the best ML-method for small to 
medium sized real-world datasets. These features make Random Forest a good 
candidate for an ML-model in active learning problems. 


In [11] different ways to estimate the uncertainty of a prediction of aRandom 
Forest are described. First, they argue that the standard approach of estimating 
uncertainty for an ensemble, taking the variance of the individual predictions, 
is not suitable for Random Forest. This is due to the different training sets 
and feature selections in the individual trees. Then, two suitable methods for 
estimating the uncertainty are presented: The first method, denoted as the Jack- 
knife Estimate, calculates the Leave-One-Out Error implicitly [12] and uses its 
average as the uncertainty. This is done by computing the difference between 
the average prediction made by trees not trained on a particular sample and 
the average prediction generated by the entire ensemble. The second method, 
referred to as the Infinitesimal Jackknife estimator [13], introduces a novel 
approach. It down-weighs each training sample by an infinitesimal amount 
and computes the variance of a prediction over all training samples. The 
variance can be interpreted as the uncertainty. To enhance the reliability and 
accuracy of uncertainty estimates from both the Jackknife and the Infinitesimal 
Jackknife estimator, the authors present unbiased versions of these methods. 
These enhancements correct the inherent upward bias observed in the initial 
estimations, ultimately resulting in better-calibrated uncertainty assessments. 
These estimates do not distinguish the two parts of uncertainty. Although 
a method exists to differentiate between aleatoric and epistemic uncertainty 
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specifically for Random Forests [14], it is designed for classification tasks and 
does not easily extend to regression problems. 


2.3 Neural Networks 


Neural Networks are widely used models, especially for large datasets [15]. 
They also provide probabilities in their predictions when trained on a classifi- 
cation problem, which can be used as an uncertainty estimate. However, the 
uncertainty (entropy of the predicted class probabilities) of these predictions is 
both poorly calibrated [4] and not applicable to regression problems, as Neu- 
ral Networks trained on regression problems only provide point predictions. 
One approach to obtain uncertainty estimates is by training an ensemble of 
Neural Networks [16], each initialized with different weights. This results 
in different Neural Networks converging to various local minima of the loss 
function. However, training multiple Neural Networks can be computationally 
expensive. Thus, implicit ensembling techniques that require the training of 
only a single Neural Network and yield well calibrated uncertainty estimates 
are presented. 


The first technique, known as Dropout, is a well-established regularization 
technique applied during the training [17]. Dropout operates by randomly 
setting the output of a fraction of neurons in each layer to zero with a specific 
probability, effectively excluding their contribution to the networks output. Im- 
portantly, this random dropout of neurons varies during each training iteration, 
encouraging the development of redundant representations and mitigating the 
risk of overfitting. Notably, during testing or inference, Dropout is deactivated 
to allow the model to utilize its full predictive capability. However, when 
Dropout is retained and applied during testing, it can be demonstrated that 
the mean and variance of multiple forward passes approximate the behavior 
of Bayesian Neural Networks [18]. The underlying concept of the uncertainty 
estimation is that the model has formed redundant representations for test sam- 
ples closely aligned with the training data, resulting in a lower variance in the 
predictions. Conversely, for samples more distant from the training data, the 
model lacks this redundancy, leading to a higher variance in predictions as they 
become heavily reliant on specific neurons. 
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Table 1: Synthetic Datasets from [23] 


Name No. of Features Noise Input Space 
Gramacy 1d 1 0.1 [0.5,2.5] 
Higdon 1 0.1 [0,20] 
Gramacy2d 2 0.01 [-2,6]? 
Branin 2 11.32  [-5,10] x [0,15] 
Ishigami 3 0.187 -r,r]° 
Friedman 5 0.1 [0, 1]? 
Hartmann 6 0.01 [0, 1]6 


The second uncertainty estimation method utilizes DropConnect, which is con- 
ceptually similar to Dropout. Initially developed as a regularization technique 
for Neural Networks [19] too, DropConnect sets connections between neurons 
to zero, instead of the output of the neurons. The authors of [20] experimentally 
show that the variance of multiple DropConnect-Neural Networks predictions 
provides better calibrated uncertainty estimates than the variance of a Dropout- 
Neural Network. 


The third method, Local Ensembles [21], approximates the variance of an 
ensemble of equally competent predictors without explicitly constructing the 
ensemble. To achieve this, the weights of the Neural Networks are perturbated 
in the direction of the smallest eigenvectors of the Hessian of the loss function. 
These eigenvectors represent directions of low curvature, indicating flat regions 
on the loss landscape, where weight-perturbations have minimal impact on 
the loss. To efficiently approximate these eigenvectors, Lanczos iteration is 
employed [22]. 


3 Experimental Procedure 


3.1 Datasets 


We use a comprehensive set of datasets, comprising seven synthetic functions 
(Tab. 1) and eleven real-world datasets (Tab. 2). Six of the seven synthetic 
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(a) Gramacyld (b) Higdon (c) Gramacy2d (d) Branin 


Figure 1: Visualizations of the Id and 2d synthetic functions (taken from [23]) 


Table 2: Real World Datasets from [26] 


Name No. of Samples No. of Features 
auto-mpg 397 7 
concrete-data 1030 8 
cps-wages 534 18 
housing 452 13 
no2 500 7 
pm10 397 7 
real-estate-valuation 414 6 
slump-test-slump 103 7 
slump-test-flow 103 7 
slump-test-compressive-strength 103 7 
winequality-red 1599 11 
winequality-white 4898 11 
yacht-hydrodynamics 308 6 


functions, described in [23], are employed in benchmarking FB-GP based 
active learning methods [2]. These synthetic functions serve to demonstrate 
the models’ capabilities in addressing common challenges: The function Gra- 
macyld (Fig. 1a) assesses the models’ ability to distinguish noise from signal. 
Higdon (Fig. 1b) and Gramacy2d (Fig. 1c) feature both linear and non- 
linear regions. The other synthetic functions are of higher dimension and 
some are characterized by strong non-linear behavior, enabling the evaluation 
of the models performance in complex scenarios. The Friedman function has 
a well-established reputation in regression problems, having been previously 
employed by Friedman et al. [24] and Breiman [25]. The datasets for every 
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function are created by drawing 2000 samples uniformly from the defined input 
space and by adding Gaussian white noise to the function output. 


While synthetic datasets offer valuable insights into specialized problem sce- 
narios, the evaluation of active learning methods on real-world datasets is 
also relevant, because real-world dataset either combine multiple synthetic 
scenarios or exhibit behavior not covered with synthetic functions. To this 
end, eleven real-world datasets that are consistent with those used by Wu et 
al. [5] are incorporated. These datasets are sourced from the UCI and CMU 
StatLib repository [26]. An overview over the datasets and their number of 
data samples and features is given in Tab. 2 


The inputs and outputs of all datasets are normalized to have zero mean and 
standard deviation one. Categorical features are one-hot-encoded for compat- 
ibility with the used ML-models. In the slump-test dataset, three target values 
are present. To deal with that each target is viewed as an individual dataset. 


3.2 Active Learning Methods 


Our evaluation focuses on three primary ML-models: FB-GPs, Random Forests, 
and Neural Networks. For FB-GPs, three distinct uncertainty estimation ap- 
proaches are employed: 


e Mean of the predicted variances: Represents aleatoric uncertainty 
e Variance of the predicted means: Represents epistemic uncertainty 


e Combination of the mentioned criteria: Represents total uncertainty 


Random Forests are evaluated using the Jackknife and Infinitesimal-Jackknife 
estimators, along with their bias-corrected counterparts. Neural Networks un- 
dergo assessment with three presented uncertainty estimation methods: Drop- 
out, DropConnect, and Local Ensembles. Additionally, a passive learning 
baseline for each machine learning method is provided, which randomly selects 
the same amount of data samples as the active learning methods. 


Each method starts with an initial training set. The size of this set is equal 
to the dimensionality of the dataset. In case of a 1d- or 2d-dataset it consists 
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of three samples. The samples of the initial set are selected as follows: The 
first sample is selected as the one closest to the mean of the input of the data. 
Subsequently, the remaining initial training samples are chosen based on their 
maximum distance to their nearest training sample. This is done to provide a 
deterministic initial training set with good diversity to the ML-models. This 
way, the starting conditions are equal and not dependent on random selection 
of initial data samples. Next, we train an ML-model and compute uncertainties 
for each unlabeled data sample. The sample with the highest uncertainty is 
then added to the training set, and the ML-model is retrained. This selection 
process is iterated 50 times. 


3.2.1 Training of ML-Methods 


Hyperparameter tuning for machine learning methods typically relies on a sep- 
arate test or validation set. However, in the active learning scenario with limited 
data availability, this approach is impractical. Therefore, hyperparameters are 
chosen based on established rules of thumbs or taken from other work that dealt 
with similar problems. 


For FB-GPs, the hyperparameters lengthscale and noise are sampled from a 
distribution, requiring the user to specify only the ensemble size, which is set 
to 800 to manage computational resources. For Random Forests, hyperparame- 
ters such as tree size and the number of input variables considered in each split 
are configured in alignment with Breiman’s original Random Forests paper 
[9], which shows minimal sensitivity to the second hyperparameter. Neural 
Networks depend on a multitude of hyperparameters, with the network-size 
being the most critical. Given the limited number of training samples in active 
learning (at most 50 to 70), we chose one-hidden-layer networks with 50 neu- 
rons, inspired by Tohme [27]. The hyperbolic tangent is used as the activation 
function, and networks are trained with the ADAM optimizer (learning-rate: 
0.01). To promote stability of the training results, network weights from the 
previous active learning iteration are used for initialization. 


To mitigate the risk of overfitting, regularization techniques are employed. 
Specifically, Dropout and DropConnect are incorporated, which are already 
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integrated into their respective uncertainty estimation methods. Dropout is 
also applied to the Neural Networks that are used for the Local Ensembles- 
method. The proportion of neurons or connections that are dropped is set to 
0.05, similar to findings from [17]. Additionally, we implement early stopping 
based on training error to accelerate the training process when convergence is 
reached. This approach is particularly advantageous when combined with the 
weight initialization from the preceding iteration, as the model is likely near a 
local minimum. 


3.3 Evaluation Process 


In other work the quality of an active learning method is commonly assessed 
by providing learning-curves, which illustrate the final performance ofthe ML- 
model as well as the speed and the stability of the learning process. Due to 
space constraints we only show the most important metric, which is the quality 
of the ML-model after the final amount of training samples is acquired. For 
regression problems, the quality is a commonly assessed by the Root Mean 
Square Error (RMSE). We use the normalized-RMSE which allows compar- 
isons over multiple dataset: 


Eoi- 


normalized-RMSE = 
Var(y) 


The normalized-RMSE is computed over the remaining unlabeled data samples 
from the pool, demonstrating the generalization capabilities of the model. The 
normalized-RMSE is also computed for the passive learning baselines, which 
randomly select the same amount of training samples as the active learning 
methods. This allows for an evaluation of the uncertainty based selection cri- 
terion. All active and passive learning methods are run ten times per dataset. 


In addition to comparing the quality of the different active learning methods, 
the computational effort of each method is considered. The average time it 
takes to acquire the final amount of data samples per method is calculated, en- 
suring uniform hardware conditions for all experiments to allow a comparison. 
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Table 3: normalized-RMSE after the final active learning iteration on real-world datasets, 
averaged over 10 runs. The best method for each dataset is printed bold. The last row 
shows the average score over all datasets (RF - Random Forest, NN - Neural Network) 


FB-GP RF NN 

Random Best Random Best Random Best 
auto-mpg 0.18 0.12 0.18 0.16 0.23 0.25 
concrete 0.28 0.25 0.40 0.48 0.33 0.41 
cps-wages 1.04 0.90 0.85 0.78 1.06 1.04 
housing 0.27 0.12 0.17 0.08 0.29 0.39 
no2 0.74 0.60 0.64 0.65 0.72 0.74 
pm10 1.06 0.88 0.84 0.81 1.11 1.12 
real-estate 0.46 0.38 0.41 0.38 0.59 0.60 
slump-slump 0.83 0.58 0.75 0.45 0.69 0.76 
slump-flow 0.63 0.51 0.63 0.48 0.67 0.68 
slump-strength 0.01 0.00 0.38 0.49 0.12 0.10 
wine-red 0.93 0.91 0.76 0.75 0.98 1.11 
wine-white 0.97 0.94 0.80 0.83 1.04 1.23 
yacht 0.01 0.00 0.19 0.05 0.05 0.03 
average 0.57 0.48 0.54 0.49 0.61 0.65 


While assessing complexity using Ö-notations is of interest, it poses challenges 
for the MCMC-methods, which are used for FB-GPs, as shown in [28]. 


4 Results 


4.1 Real-World Datasets 


Tab. 3 shows the normalized-RMSE of the model after the final active learning 
iteration for the real-world datasets. For every ML-method the normalized- 
RMSE of the passive learning baseline (Random) and the best uncertainty 
estimator per dataset are shown. This is done to make the results more clear 
and to enable the comparison of the maximum potential of each ML-model for 
active learning. In the last row the average RMSE per method is shown for an 
overall comparison. 
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Table 4: Average RMSE per uncertainty estimator on real-world datasets (J - Jackknife estimator, 
IJ - Infinitesimal Jackknife estimator) 


FB-GP 
aleatoric epistemic total Random 
0.57 0.51 0.49 0.57 
RF 
J unbiased J IJ unbiased IJ Random 
0.52 0.52 0.51 0.51 0.54 
NN 
DropConnect Dropout LocalEnsemble Random 
0.74 0.73 0.69 0.61 


The average score illustrates a comparable performance between active learn- 
ing with Random Forests and FB-GPs, while active learning with Neural Net- 
works performs worst. The RMSE per dataset also reinforces this observation, 
as for no dataset the Neural Networks perform best, whereas Random Forests 
performs best roughly on the same amount of datasets as FB-GPs. The per- 
formance of both methods is also very similar for every datasets, with two 
major exceptions: For the concrete and the slump-strength dataset the FB- 
GPs outperform the Random Forests severely. This could be a result of poor 
hyperparameter choice or because Random Forests are a bad model choice for 
those particular datasets. Additionally, the random baseline (passive learning) 
of Random Forests performs better in both cases, indicating sub-optimal un- 
certainty estimates unable to pinpoint regions benefiting from additional data 
samples. This trend of poor uncertainty estimates is further evident for Neu- 
ral Networks, where, barring three datasets, the passive learning consistently 
outperforms the best active learning strategy. Because the papers introduc- 
ing these methods show that they produce reliable uncertainty estimates, this 
suggests the inadequacy of the use of these uncertainty estimators for active 
learning. A potential explanation for this unsuitability could be the general 
poor generalization of the Neural Networks due to suboptimal hyperparameter 
selection. However, to proof the hypothesis that a poorly fitted model causes 
poor uncertainty estimates, dedicated experiments need to be conducted. 
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Tab. 4 shows the average scores per uncertainty estimator for the real-world 
datasets. This enables the comparison between the different uncertainty es- 
timators of the specific ML-models. The top section emphasizes the advan- 
tage of distinguishing the uncertainty into its epistemic and aleatoric compo- 
nents, because active learning with FB-GPs based on the epistemic uncertainty 
demonstrates better results than the aleatoric-based approach. Interestingly, 
the total uncertainty criterion proves most effective for active learning, even 
though both aleatoric and epistemic uncertainties contribute equally. 


Comparing random baselines indicates Random Forests as the optimal model 
choice for real-world datasets, consistent with findings by [10]. However, FB- 
GP uncertainty estimates exhibit a greater performance increase (compared to 
the passive learning baseline) than Random Forests. This indicates that active 
learning with Random Forests could benefit from a better uncertainty estimate 
that differentiates between epistemic and aleatoric uncertainty. 


For Random Forests one can see that the unbiased versions of the Jackknife and 
Infinitesimal Jackknife do not achieve increased performance regarding active 
learning compared to the non-bias-corrected versions. This is an expected 
result as the bias correction is achieved by dividing the uncertainty estimate 
by aconstant. For active learning, the data sample with the highest uncertainty 
is added to the training data, which does not change when all values are divided 
by a constant. The difference between the Jackknife and the Infinitesimal 
Jackknife is not significant on average. 


The uncertainty estimator for Neural Networks are all, as already mentioned, 
not suitable for active learning without further research and adaptation. The 
average results show that Local Ensembles outperform Dropout and Drop- 
Connect, which have similar results, which is explainable due to their similar 
nature. 


4.2 Synthetic Datasets 


The outcomes obtained from the synthetic datasets are presented in Tab. 5. 
They showcase substantial distinctions from the results observed in real-world 
datasets. Notably, Random Forests and Neural Networks demonstrate similar 
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Table 5: normalized-RMSE after the final active learning iteration on synthetic datasets, averaged 
over 10 runs. The best method for each dataset is printed bold. The last row shows the 
average score over all datasets. 


FB-GP RF NN 

Random Best Random Best Random Best 
Gramacyld 0.07 0.022 0.05 0.05 0.08 0.09 
Higdon 0.06 0.04 0.06 0.05 0.09 0.09 
Gramacy2d 0.46 0.02 0.62 0.79 0.79 1.01 
Branin 0.13 0.08 0.37 0.50 0.16 0.31 
Ishigami 0.42 0.54 0.57 0.60 0.67 0.91 
Friedman 0.04 0.02 0.33 0.60 0.22 0.28 
Hartmann 0.58 0.46 0.86 0.81 1.16 0.86 
average 0.25 0.17 041 0.49 0.45 0.51 


performances, while active learning based on FB-GPs significantly outper- 
forms both. FB-GPs exhibit superior performance for each dataset, solidifying 
their efficacy. Furthermore, the passive learning approach outperforms the 
active learning approaches for Random Forests and Neural Networks, indi- 
cating poor uncertainty estimates. Conversely, active learning with FB-GPs 
achieves superior results compared to passive learning for all datasets except 
the Ishigami dataset, which is characterized by strong non-linearities. This 
suggests that complex problems for which an ML-Model achieves poor gen- 
eralization, may introduce uncertainty estimates that are not useful for active 
learning. 


Analyzing the passive learning scores shows that, for the synthetic datasets, 
FB-GPs emerge as the optimal model choice. This deviates from the results 
observed with real-world datasets. This difference can be attributed to distinct 
characteristics of the synthetic datasets, notably the presence of homoscedastic 
noise and the continuous nature of their underlying functions. These fea- 
tures inherently favor FB-GPs, as they are able to model continuous output 
well due to their probabilistic distribution over functions. Additionally, FB- 
GPs employ a single noise parameter for the entire input space, predicated on 
the assumption of homoscedastic noise. While this assumption contributes to 
strong performance when homoscedastic noise prevails, it poses challenges in 
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Table 6: Average RMSE per uncertainty estimator on synthetic datasets 


FB-GP 
aleatoric epistemic total Random 
0.34 0.17 0.19 0.25 
RF 
J unbiased J IJ unbiased IJ Random 
0.53 0.59 0.56 0.52 0.41 
NN 
DropConnect Dropout LocalEnsemble Random 
0.69 0.56 0.58 0.45 


effectively modeling heteroscedastic noise. However, it is essential to note that 
these favorable characteristics of synthetic datasets are not necessarily given 
for real-world datasets, requiring further experiments to validate the hypothesis 
that the performance of FB-GPs relies on the characteristics of the dataset. 


The average results per uncertainty estimator (Tab. 6) strengthen the suitability 
of epistemic uncertainty as an active learning criterion over aleatoric uncer- 
tainty, with the latter performing worse than passive learning. In contrast to 
the real-world datasets, the total uncertainty does not result in a performance 
improvement. The uncertainty estimator performance for Random Forests and 
Neural Networks deviates from the results observed on real-world datasets. For 
Random Forests, the Infinitesimal Jackknife yields better results, with a slight 
advantage for the unbiased version, while the unbiased version of the Jackknife 
estimator outperforms the standard version. In the previous section, we argued 
that the unbiased and non-bias-corrected uncertainty estimates for Random 
Forests should theoretically yield similar results. However, due to the highly 
stochastic nature of Random Forests, caused by bootstrapping and random 
feature selection, the sample size of ten active learning runs might not suffice 
to ensure comparable results. This is further underscored by the observation 
that, for the Jackknife estimator, the unbiased version performs worse than 
the biased one, whereas for the Infinitesimal Jackknife estimator, the unbiased 
version fares better. If a significant difference in performance resulted from the 
bias correction, one would expect the difference to be consistent across both 
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Table 7: Average Time per uncertainty estimator in seconds 


FB-GP RF NN 
Real-World 2851 6 137 
Synthetic 1579 9 202 


estimators, which is not the case. Regarding Neural Networks, DropConnect 
performs notably worse, while Dropout and Local Ensembles deliver roughly 
equivalent performance. 


4.3 Computational Effort 


As emphasized in the introduction, computational efficiency is as well an as- 
pect as predictive performance when comparing various active learning meth- 
ods. Tab. 7 outlines the time in seconds required to select a specific quantity 
of data samples, in this case fifty, averaged across all active learning runs for 
each respective ML-model. The results are quite apparent, with active learning 
using Random Forests proving to be the fastest. In contrast, Neural Networks 
are approximately 20 times slower, and FB-GPs are notably slower, ranging 
from approximately 100-500 times slower depending on the dataset. Although 
these results may vary with the amount of parallelization or optimized imple- 
mentations, the trend of the computational effort for the different ML-methods 
is evident given the substantial differences observed. 


The difference between the real-world and synthetic datasets is shown in Tab. 
7, because it is notable with approximately 50%. For Random Forests and Neu- 
ral Networks, the synthetic datasets are slower, whereas for FB-GPs, the real- 
world datasets display slower computation. This difference can be attributed 
to the pool size; the synthetic datasets comprise 2000 data samples, exceeding 
the size of most real-world datasets. The ML-model’s predictions are important 
for uncertainty estimation, and processing becomes slower with a larger pool 
size. While this holds true for FB-GPs, their slower performance on real-world 
datasets is primarily due to the higher dimensionality of these datasets. The 
FB-GPs sample one lengthscale-parameter for each input dimension resulting 
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in more parameters drawn by the overall time-consuming process of MCMC- 
sampling. 


5 Conclusion and Future Work 


Our work compares active learning strategies with three Machine Learning 
models and different uncertainty estimates for them. We apply a comprehen- 
sive set of datasets with real-world and synthetic problems to present individual 
strengths and weaknesses of the ML-models and their uncertainty estimates. 


A key results is the superior reliability of uncertainty estimates provided by the 
FB-GPs. Their ability to differentiate aleatoric and epistemic uncertainty con- 
tributes to a high-quality active learning performance without the need for hy- 
perparameter selection. The excellent results on different synthetically created 
problems, like noise-signal differentiation, makes them a good model choice 
when time and computational resources are not ofthe essence. Random Forests 
demonstrate performance comparable to FB-GPs across real-world datasets 
and present a significant speed advantage over them. Additionally, Random 
Forests are robust regarding the choice of hyperparameters, an important fea- 
ture for active learning models due to the expensive or not feasible tuning 
process. This makes them the best model-choice for efficient active learning 
on real-world data. Conversely, Neural Networks are the least favorable ML- 
model, emphasizing the crucial role of hyperparameters that are challenging 
to select heuristically and ultimately impact the uncertainty estimation quality. 
Despite their relatively efficient processing, they fall behind Random Forests 
in terms of computational efficiency. 


Moreover, our investigation underscores the importance of epistemic uncer- 
tainty for active learning. This motivates further research particularly for un- 
certainty estimates of Random Forests as there is to the best of our knowledge 
no method that differentiate epistemic and aleatoric uncertainty for regression- 
problems. Lastly, further research into the interplay between model char- 
acteristics and dataset features is motivated by the fact that FB-GPs excel 
significantly on synthetic data. 
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Abstract 


In the biomedical environment, experiments assessing dynamic processes are 
primarily performed by a human acquisition supervisor. Contemporary imple- 
mentations of such experiments frequently aim to acquire a maximum number 
of relevant events from sometimes several hundred parallel, non-synchronous 
processes. Since in some high-throughput experiments, only one or a few 
instances of a given process can be observed simultaneously, a strategy for 
planning and executing an efficient acquisition paradigm is essential. To ad- 
dress this problem, we present two new methods in this paper. The first method, 
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Encoded Dynamic Process (EDP), is Artificial Intelligence (AI)-based and 
represents dynamic processes so as to allow prediction of pseudo-time val- 
ues from single still images. Second, with Experiment Automation Pipeline 
for Dynamic Processes (EAPDP), we present a Machine Learning Operations 
(MLOps)-based pipeline that uses the extracted knowledge from EDP to effi- 
ciently schedule acquisition in biomedical experiments for dynamic processes 
in practice. In a first experiment, we show that the pre-trained State-Of-The- 
Art (SOTA) object segmentation method Contour Proposal Networks (CPN) 
works reliably as a module of EAPDP to extract the relevant object for EDP 
from the acquired three-dimensional image stack. 


1 Introduction 


For the imaging-based assessment of dynamic processes in biomedical settings, 
objects of interest must be identified and relevant events that characterize the 
dynamic process must be recorded during their time of occurrence. Commonly, 
a human operator controls the imaging instrument to examine a biomedical 
sample using a microscope and relevant objects are found in the sample by 
inspection. Alternatively, the operator estimates for each object of interest 
the time at which an event of interest is expected to occur based on previous 
experience and triggers the recording of the event at that time. Nevertheless, 
many contemporary experiments provide several hundred relevant objects that 
can, in principle, be imaged in parallel. Events of interest, however, are non- 
synchronous and the estimation of future event times requires extensive human 
effort, is prone to error, and not necessarily time-efficient. These obstacles can 
result in unnecessarily large amounts of irrelevant data, unnecessary experi- 
mental repeats, or experimental biases inflicted by additional light exposure of 
the sample [18]. To address these obstacles, we present two new methods for 
the automated, real-time planning and execution of such experiments. 


EDP. The traditional method of capturing all data or relying on human ex- 
perience to predict future events is outdated and inefficient. Instead of relying 
on human experience, an accurate and comprehensive model of the dynamic 
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process should be created. This model should be capable of uniquely identi- 
fying a relevant object state through a process known as fingerprinting, similar 
to how humans do it. In biomedicine, it is crucial that this fingerprint remains 
consistent despite contextual changes such as noise, brightness changes, or 
affıne transformations. By modeling the relationship between the fingerprints, 
the dynamic process can be represented orderly. This representation suits as an 
approximation of relative progress within the dynamic process. This relative 
progress can also be interpreted as a relative time, known as pseudo-time. By 
adopting this method, we can achieve efficiency in predicting future events and 
a deeper understanding of the dynamic process. 


EAPDP To unlock the full potential of the EDP, a well-designed pipeline is 
an absolute must. Such a pipeline should be able to recognize specific states 
in the real world, identify relevant objects, and then calculate a pseudo-time 
for those states using the EDP. Armed with this knowledge, the pipeline can 
automatically plan and execute a new state capture for any significant event 
that occurs. Due to the uncertain nature of the EDP predictions, the pipeline 
must be able to respond to unsuccessful recordings and learn from them. This 
is where MLOps [1] comes in. By retraining an existing production model 
in accordance with the live context, MLOps ensures that the pipeline always 
uses a current and accurate model, resulting in a potentially better outcome in 
real-world experiments. 


2 Related Work 


Object Extraction. Basically, there are different possibilities in Computer 
Vision (CV) like object detection and object segmentation, to identify individ- 
ual objects in an image [10] and thus extract them. In the biomedical context, 
many methods mostly focus on segmentation [42] with SOTA methods like 
StarDist [38] and CPN [46]. 


Pseudo-time predictions. A first approach for pseudo-time predictions with 
classical, non Deep Learning (DL) methods was presented in [14]. For extract- 
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ing relevant objects from the acquired image, thresholding is used as a classical 
CV segmentation method. Then the object’s fingerprint is generated linearly 
with a Principal Component Analysis (PCA) [19, 20]. However, biological 
processes are usually not linear [5]. Therefore, recently, non-linear encodings 
of the dynamic processes using DL methods have become popular [17, 9, 21, 
32]. For example, [21] and [9] present DL approaches to encode cell cycles and 
derive predefined cell phases. However, this classification-based approach does 
not allow for deriving continuous relations like the pseudo-time to each other 
directly. This continuous relation was modeled with DeepCycle in [32]. The 
training of DeepCycle is performed supervised. For this purpose, virtual labels 
are calculated based on the fluorescence intensity in specifically labeled chan- 
nels of the cells. These classes can then be used as anchor points during training 
to determine a (relative) cell state as a pseudotime. It is important to note that 
the assumption that a correlation of fluorescence intensity to cell phase can be 
used is not always true in the biological context. A DL method that follows 
a comparable pseudo-time approach to the given constraints, in this paper, 
was presented in [17]. In [17], an autoencoder (AE) approach for pseudo- 
time approximation is used as a Self-Supervised Learning (SSL) approach. A 
DL model is used as the Variational Autoencoder (VAE) [23] encoding, from 
whose Hierarchical Agglomerative Clustering (HAC) and Minimum Spanning 
Tree (MST) code the pseudo-time is then determined. However, this approach 
also has a few limitations. First, a recording necessarily contains exactly one 
relevant object in one acquired image. Second, the entire dataset was acquired 
under comparable acquisition conditions, which also only contain identical po- 
sitioned and oriented objects and are not able to learn affine transformations [3] 
between objects. Both constraints are generally not satisfied for microscopic 
images, such as in [27, 36]. Furthermore, this pseudo-time method was not 
designed as an End-to-end (E2E) model, which deprives the DL model of the 
ability to internally bind affine transformed objects. 


AutoEncoder. Autoencoders are SSL methods to learn a representation from 
a given suitability, such as an image [49, 47]. For example, the autoencoder can 
be represented by a Conventional AutoEncoder (CAE) [49] and/or a VAE [23]. 
Especially recently, masked autoencoders (MAEs) [15] have become more 
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popular than CAE because of their ability for a better visual representation 
learning [49], either using a transformer-based approach [15] or the Convolu- 
tional Neural Network (CNN)-based approach [47]. However, since in [47] 
the higher efficiency of ConvNeXt V2 is shown, this model is chosen for this 
work. In addition to the pure learning of a visual representation in the form of 
a fingerprint, relations between the fingerprints can also be learned, e.g. with 
VAE [17]. 


SSL Totrain the DL model in a supervised manner, labeled data are generally 
rare in the biomedical domain [13]. There are DataBases (DBs) like BioMed- 
Image.io [30] or several challenges with own datasets [26, 2, 28]. However, 
especially for biological datasets with sometimes hundreds of relevant objects 
in an image, the datasets are often limited to the 2D case. Furthermore, in 
the context of this work, labeling the relevant events with pseudo-time stamps 
is only approximate, demanding, error-prone and time-consuming. Therefore, 
unsupervised or SSL methods are often used in the biomedical context [43]. 
Thereby, Active Learning (AL) [41] is used to selectively integrate expert 
knowledge into the learning process. Since AL aims to keep the number of 
interactions to a minimum [7, 33], data-efficient learning is preferable. For ex- 
ample, existing datasets from a related context can be identified and leveraged 
to train more robust models in transfer learning [11, 29, 48]. In order to be 
able to use possibly directly existing pre-trained models from similar contexts, 
a new concept was developed in [48]. 


3 Methodology 


3.1 EDP 


For the modeling of the dynamic process, a new concept is introduced with 
EDP. The new concept of EDP is based on an AE and is visualized in Fig- 
ure 1. The basic idea of the new concept is to separate the generation of the 
fingerprint from the learning of the relation as a representation between all 
states. The fingerprint generation is done using a MAE as an evolution of CAE. 
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Encoder Fingerprint Decoder 


Relation Representation 


Figure 1: Visualization of the EDP model. A 3D object is transformed by an encoder into a 
fingerprint (like MAE) and into a relation representation between the fingerprints (like 
VAE). The example object is a recording of the DNA channel of a nucleus from an 
internal zebrafish embryo dataset. A scale bar of 2um is indicated at the bottom of the 
input/output image. 


Specifically, the SOTA MAE-based method ConvNeXt V2 is chosen. During 
the learning process, a maximum recovery of the encoded image is aimed in 
accordance with SSL. According to the challenge posed by biological objects, 
a context-independent representation is required. For this purpose, the images 
can be modified during the learning process through Data Augmentation [16] 
techniques like Rotations, reflections, contrast adjustments or noise additions. 
In addition to the fingerprint, the relation must also be learned as the actual 
modeling of the dynamic process. For this purpose, a VAE-like modeling 
is used by learning the uncertainty v in addition to the circle angle &. The 
assumption is that objects succeeded each other in the dynamic process with 
the relative distance corresponding to this relative distance and differ in the 
same ratio in the circle representation. Such an exemplary circle representation 
is shown in Figure 2 using a cell division process of the zebrafish embryo. 
The state of the cell after cell division is visualized at 00 o’clock and up to 
the state of the cell just before cell division at 11 o’clock. This corresponds 
to a relative distance of ~0.92 (normalized between [0,1)). This temporal 
difference must also be valid in reality for the temporal distance according 
to the model statement. 


3.2 EAPDP 


The new EDP module is integrated as a module into the new MLOps-based 
pipeline EAPDP. The pipeline concept is visualized in Figure 3 and contains 
nine other modules besides the EDP module. Each of these ten modules is 
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Figure 2: Example of a 2D feature space representation for an encoded dynamic process. Each 
point represents an encoded image. The circle serves as an estimation for the positioning 
of points in its vicinity. It’s worth noting that due to the presence of uncertainty, the 
points may not always be precisely on the circle, but rather in its proximity. For the 
seven red dots, example images of cell nuclei from zebrafish embryos at various stages 
of cell division are shown. A scale bar of 2 um is indicated at the bottom of each example 
image. The images are from an internal dataset. 


briefly described below. The explanation of the modules and their relationships 
to each other is based on the pipeline visualization of Figure 3. 


Microscope setup. In the EAPDP, the microscope is used as an actuator 
to the real-world environment represented by a biomedical sample. For this 
purpose, all microscope components relevant to image acquisition and the 
microscope accessories, such as lasers in the case of a laser scanning micro- 
scope, must be controllable via appropriate interfaces of the specific micro- 
scope setup. In addition, the microscope must be able to react on given com- 
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Figure 3: MLOps pipeline with the new EDP module. Al-based MLOps modules are marked 
with a green background, non AI-based ones with a blue background. Additionally, all 
modules marked with a red border must be newly developed or only partially adapted 
from existing methods. 


mands like image acquisition or requested meta-information like the objective 
position in a standardized way. 


Image Pre-processing. To optimize the analysis of dynamic processes in the 
biomedical environment, the raw images acquired through experiments must be 
pre-processed according to the microscope setup and the context of the targeted 
event. This may involve methods such as cropping, contrast adjustment, or 
denoising. Various libraries, such as Albumentations [4], offer pre-processing 
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methods that can be used to improve the quality of the images and optimize 
their analysis by other modules in the machine learning operation pipeline. 


Object Extraction. During an acquisition, the relevant object and the sur- 
rounding context are captured. In order to better analyze the object, it is 
necessary to separate it from the surrounding context. The extraction from the 
whole image is done via pre-trained segmentation methods. To find a suitable 
method, we compare the SOTA cell segmentation algorithms StarDist and 
CPN using a microscopic dataset in a first experiment. Importantly, the actual 
pseudo-time determination cannot be performed if both methods’ segmentation 
is insufficient. Therefore, this submodule is of particular importance. Because 
(well) labeled data are generally scarce in the biological context, this work 
evaluates generalization performance during inference with already pre-trained 
models on new, unknown images. Since there are only pre-trained weights for 
2D segmentation for both methods, the dataset was split into 2D images along 
the z-axis. 


EDP. The EDP module gets the extracted object and should pass the pseudo- 
time to the experiment planner. In order to do this, the module is equipped by 
the Experiment planner before with the appropriate experiment setup. With the 
setup, the EDP model can then query according to its existing knowledge like 
pre-trained models in the context of data-efficient learning. If no weights are 
available, training can also be done with/without AL as specified by the expert. 
After successful training, the model is passed to the DB with the appropriate 
required metadata for possible further use. Then, when the inference with the 
original extracted relevant object has been determined, the results are passed 
to the expert planner accordingly. The recorded inference image is also sent to 
the Data-efficient Learning module and stored in its DB. 


Experiment planner. The Experiment planner is the central module of the 
experiment automatization. As input it gets the pseudo-times for the recorded 
objects. Based on the experiment context, including interesting events, the 
Experiment planner can plan future experiments with utmost precision. Once 
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the plan is set, the Experiment planner gives the microscope the command to 
ensure that the image captures the object’s state at the right time, leaving no 
room for errors. Additionally, it can query the state of the microscope to ensure 
that there was no hardware drift, such as when moving to the object position. 
All the information about the experiment’s state is then passed on to the User 
Interface (UD, ensuring that all aspects of the experiment are under control. 


Ul. The UI is the interface between the expert and the MLOps pipeline. On 
the one hand, simple interactions can be provided, such as displaying meta- 
information or adapting the experimental context, e.g. the cell classes that 
occur. On the other hand, much more complex interactions such as result 
justifications of DL models can be represented through Explainable Artificial 
Intelligence (XAT) [34] or expert knowledge can be brought into the pipeline 
within the context of AL. With XAI, the expert should be able to understand 
better the processes in the DL models used and why decisions were made, 
e.g. for event detection. This helps the expert to eliminate potential errors like 
unfavorable experiment settings at an early stage. Such XAI methods can be 
realized using a library like PyTorch Captum [25]. For AL, only if the expert 
can capture the actual state in the best possible way, the expert can transfer 
his domain knowledge to the method in the best possible way and support the 
method. For example, points, boxes, or entire segmented regions can be passed 
to the method as hints. For this purpose, a custom segmentation module can be 
developed based on exiting AL labeling platforms like ObiWanMicrobi [40] or 
Karlsruhe Image Data Annotation (KaIDA) [37]. 


Expert (Domain) Knowledge. The domain knowledge contributed by the 
expert to the MLOps pipeline can take several forms. For example, the context 
of the experiment with a specific cell class can improve a more efficient event 
detection module. Furthermore, knowledge can be injected, e.g. by labeling 
in the context of AL. For this, the expert must ensure the quality of the in- 
jected domain knowledge with maximum correctness. Incorrect information 
can affect the learning processes in the network. 
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Data-efficient Learning. To minimize AL interactions with the human ex- 
pert, as much existing knowledge as possible is reused. To this end, building 
on [48], a new AE-based fingerprinting approach for datasets and Machine 
Learning (ML) models is being implemented to reuse as much knowledge as 
possible. For this, from a DB, context-given requirements can query existing 
knowledge. If no data is available, it can be created synthetically e.g. with 
biophysical simulations like Large-scale Atomic/Molecular Massively Parallel 
Simulator (LAMMPS) [44]. 


Microscope control. In order for the planned experiment to be automated 
and performed in real-time, a corresponding software library is needed to con- 
trol the microscope. Since the first release in 2010, u Manager has been used 
for this purpose as one of the SOTA open-source solutions [6, 22, 31, 45]. 
Therefore, this is also used in this work. 


3.3 Exemplary Use Cases 


Example use cases for the presented EAPDP with the EDP are presented using 
record extracts in Figure 4 below. A first use case is shown in Figure 4a and 
represents the temporal sorting of RiboNucleic Acid (RNA) Polymerase II (Pol 
II) clusters that occur in the nuclei of pluripotent zebrafish embryos. A method 
for this use case has already been presented in [14]. A comparison of the 
pipeline based on classical ML methods with our DL-based EDP method al- 
lows a direct statement about limitations or improvements of our approach. The 
second use-case in Figure 4b is the recording of cell divisions in pluripotent 
zebrafish embryos, where the time of reaching a new division stage and thus 
the regions of an event of interest need to be extrapolated. A final biological 
application from the field of microbiology is presented in Figure 4c. In this 
example, one interesting event could be the state at which n microbes reach the 
recording region. For this purpose, a modeling of the cell division process with 
EDP can be used to plan the experiment accordingly and automatically record 
the event of interest at a time t. The modeling of the cell division process with 
EDP can be used for this purpose. 
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(a) [27] (b) Internal dataset (c) [39] 


Figure 4: Three exemplary images from biological datasets for dynamic processes. Figure 4a 
shows cell nuclei of zebrafish embryos with marked Pol II Ser5P clusters. Figure 4b 
shows the DeoxyriboNucleic Acid (DNA) of zebrafish embryos nuclei. In these first 
two images, a scale bar of 201m is shown in the lower left. The last Figure 4c shows a 
microbial cell division state. 


In addition to these biological use cases, other use cases are also possible, e.g. 
in medicine. For example, by modeling a tumor accordingly, a prediction can 
be made about the relative stage. Consequently, a therapy concept such as 
surgery or medication can be tailored to the patient. 


4 Experiments 


The comparison of segmentation algorithms was performed on Helmholtz AI 
COmpute REssources (HAICORE) resources equipped with Intel Xeon Plat- 
inum 8368 Central Processing Units (CPUs) and an Nvidia A100-40 Graphics 
Processing Unit (GPU) [24]. The operating system utilized was Red Hat En- 
terprise Linux (RHEL) version 8.6. 


4.1 Dataset 


The internal microscope dataset from Figure 4b is used to compare the seg- 
mentation algorithms. This dataset was chosen over the other two example 
datasets from Figures 4a and 4c because of the challenging, frayed structure of 
the nuclei as the relevant image object. This is because the fibrillar structure 
of the nuclei sometimes deviates strongly from their typical ellipsoidal shape 
as in Figure 4a due to individually advanced cytokinesis. This poses a chal- 
lenge because contiguous pixel regions are not trivially identifiable and correct 
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boundary segmentation is a challenge. With microbeSEG [35], a working 
SOTA solution for microbes like in Figure 4c also already exists. 


For the dataset in Figure 4b, zebrafish embryo DNA was imaged. DNA was 
stained with 1:10000 5’-TMR Hoechst in TDE or glycerol. Confocal z-sections 
were obtained using a commercial instant SIM microscope (iSIM, VisiTech). A 
Nikon 100x oil immersion objective (NA 1.49, SR HO Apo TIRF 100xAC Oil) 
and a Hamamatsu ORCA-Quest camera were used for image acquisition. In 
accordance with a common problem in biology, no labels exist for this dataset. 
According to the desired 2D segmentation, the 3D images are split into 2D 
images along the z-axis. 


4.2 Object extraction 


These 2D images were then segmented using each of the two methods. In the 
following, the results are evaluated qualitatively because of the non-existent 
labels. Therefore, the results are shown in Figure 5. The comparison of the 
original image in Figure 5a with the StarDist prediction in Figure 5b, shows 
that StarDist cannot well segment semantically related objects as the nucleus in 
the upper area. For the method designs with center prediction, StarDist focused 
primarily on segmenting ellipsoidal objects from [12] and was trained only on 
these. The cell detection was designed to be more flexible and additionally 
trained on a more heterogeneous set of non-elliptical cells such as MCF7 from 
the dataset [8]. This leads to better generalization and results in qualitatively 
evaluated good initial segmentation performance on this most challenging of 
our datasets from Figure 4. 


Thus, we could show that CPN is a good pre-trained SOTA approach for ex- 
tracting the relevant objects from the 2D decomposition. The 2D segmenta- 
tions can be reassembled back to 3D segmentations in post-processing, e.g. 
using Nearest Neighbor. Based on this, the further submodules of EDP can be 
developed in future work and the presented MLOps pipeline can be built upon 
1t. 
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(a) Original (b) StarDist [38] (c) CPN [46] 


Figure 5: Comparison of segmentation predictions for the two SOTA methods StarDist [38] and 
CPN [46]. Figure 5a represents the original image, duplicated from Figure 4b. In 
Figure 5b and Figure Sc, the predictions of pre-trained StarDist or CPN are then shown. 
The predictions are highlighted differently for better visual differentiation depending on 
the method used. 


5 Conclusion and Further Work 


In this work, we motivated that due to the large number of parallel non- 
synchronous dynamic processes, a novel concept for automated planning and 
execution of two novel DL-based approaches is essential. First, the EDP 
was introduced to model dynamic processes and derivate a pseudo-time for 
a given object state. The pseudo-time prediction can then be used with 
the EAPDP for real-time experiment automation. We explained the EDP 
realized within the MLOps pipeline by an AE and trained using SSL with 
AL. At the same time, the key advantage of higher execution speed and lower 
human cost while minimizing user interactions with data-efficient learning 
was highlighted. Finally, as a first Proof of Concept (PoC), we showed the 
necessary pre-processing step for the EDP to extract the relevant objects based 
on good inference results of CPN. 


However, the lack of pre-trained weights for 3D segmentation was a draw- 
back of the segmentation experiments. However, since the fragmented objects 
are partially reconnected along the z-axis, this could simplify the problem 
and improve accuracy. This will be done as soon as appropriate weights are 
available. In addition, a suitable affine-invariant 3D AE needs to be developed 
for use within the EDP method. In this context, further research is needed to 
investigate whether the ConvNeXt V2 is suitable for 3D segmentation, also 
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from an efficiency perspective. Of course, the modules of the MLOps pipeline 
must be implemented accordingly. 
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1 Introduction 


The heterogenous nature of energy systems is a big challenge for the large- 
scale implementation of energy management systems (EMS). The different 
energy systems can be small energy communities, factories, or large-scale 
supply areas. A consistent challenge throughout the various energy system 
forms is the accurate modelling of the generation and storage units as well as 
the prediction of consumer behaviour, all what is needed to optimize which 
energy source to use. While significant work has been put into the automated 
prediction of consumption timeseries, the modelling of generation and storage 
units is still mostly done “by hand” and is reliant on external information 
sources like datasheets. Models created using this data basis are extremely 
vulnerable to discrepancies between the external information they are based 
on and the actual unit properties. Such inaccuracies often go unnoticed in the 
initial model generation, until they cause large errors in the optimization which 
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then require time consuming analysis and finetuning to correct. In this paper 
an initial approach to automate the modelling of generation and storage units is 
presented. After that the approach is tested on its sensitivity to different training 
data sets since a dependency on high quality training data would severely limit 
its transferability. 


2 Concept 


The idea of the approach is to create a library of simplified generic models 
for the most common types of generation and storage technologies. These 
models can then be fitted to match the properties of the target units using key 
parameters whose values are identified by an artificial neural network (ANN). 
For this purpose, the ANN is trained on timeseries data that has been created 
using numerus simulated units with different properties of one unit type. The 
defining properties of the simulated units have been distributed to reflect the 
entire feasible parameter range. The trained ANN is then used on measured 
data of the target unit to identify the relevant parameter values to modify the 
generic model. The resulting fitted model should more accurately reflect the 
properties and performance limits of the target than a model based on generic 
datasheet information. Additionally, it would be easy to implement a periodic 
refitting of the model on the most recently measured data. This would allow the 
model to reflect changes due to aging and could even be used as an indicator 
for maintenance scheduling. 


3 Generation of training data 


The performance of ANN is dependent on the relation between the data used 
for its training and the target. Since the goal is the recognition of the physical 
parameters describing the behaviour of the target unit, the input timeseries must 
be related to / influenced by, said parameters. Unfortunately, high resolution 
timeseries measurements of the target unit are rarely available which forces 
engineers to install temporary measurement equipment which then can only 
provide data for a short time range. 
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For this paper a vanadium redox flow battery storage, build as part of the 
research project: "Smart Region Pellworm" [1], was used as target unit. First 
a model for this unit type was chosen from the available literature [2]. This 
model was further simplified until its behaviour was defined by 6 variable 
parameters. In the next step plausible parameter ranges were set and discretized 
into steps, resulting in 125 000 possible parameter combinations. These com- 
binations provide the labels for the training data. As input variables, timeseries 
for the voltage (U), current (I) and state of charge (SoC) were chosen. 

In the next step an initial SoC was selected for each model configuration after 
which they were run through specific load profiles to generate the required 
timeseries for I, U and SoC. Here 3 different sets of load profiles were used to 
test the ANN’s dependency on data that is highly correlated to the target unit. 
In set 1, each model configuration was given one of 4 historic measured load 
profiles and related initial SoC’s of the target unit to create the training data. In 
set 2, 10 historic measured load profiles and SoC’s were used and in the last, 
set 3, each model configuration was given a randomly generated load profile 
and starting SoC. This resulted in 3 separate training data sets. 


4 Parameter recognition 


The ANN used in this work is a basic MLP [3] with 3 hidden layers. All layers 
used the tanh activation function except the output layer for which a linear ac- 
tivation function was used. The width of the layers was arranged in descending 
size with the intention of compressing the information contained in the input 
timeseries, with 3 times 288 timesteps, down to the 6 target parameters. The 
timeseries and labels were normalized between -1 to 1 and 0 to 1 respectively, 
before being used in the training. During training overfitting was limited by an 
early stopping function with a patience of 10 steps. 

The resulting trained ANN were first tested solely on the historic data belong- 
ing to the load profiles used in the creation of their respective training data. It 
became apparent that the parameter values identified by the two ANN trained 
on historic load profiles of the target united, differed significantly form the val- 
ues of the ANN trained on randomized load profiles. Especially the parameters 
total vanadium concentration and maximum flow speed of the electrolyte are 
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Figure 1: Heatmaps depicting the efficiency of the battery system models in realation to the SoC 
and charging (positiv) or discharging (negative) power 


lower than expected in the historic data trained ANN. These parameters, in 
combination with the SoC, are responsible for determining how much usable 
vanadium ions can be pumped into the cell, which in turn defines the maximum 
charge and discharge power of the battery. 


To allow for a better comparison between the parameter sets a second test was 
performed during which the ANN where each given historic data of 50 days 
and the resulting parameters where averaged and put into individual sets. To 
evaluate the accuracy of the identified parameters sets they were then used to 
fit 3 generic models to the target battery. The fitted models were then used to 
simulate a test scenario and from the resulting data the efficiency across the 
full range of possible operation situations was calculated. 


The results are presented in the heatmaps of Figure 1 which show the efficiency 
across the full range of possible SoC’s and all charge and discharge speeds. 
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The heatmap of Figure 1d was generated during experiments with the target 
unit and represent the real battery properties. 


The heatmap in Figure la was generated by the model fitted with dataset 1 and 
shows a very early discharge power restriction, these restrictions can be seen 
in the heatmaps as 0 % efficiency zones. According to this model it would be 
impossible to discharge the battery further than roughly 50 %. Additionally, it 
shows an artifact near a SoC of 10 to 0 % which would indicate that a further 
discharge would suddenly become possible again. The charging side however 
is much better represented, while the charging power restriction near a full SoC 
is significantly overestimated, the remaining field is much closer to the reality 
than the discharge field. 


The heatmap in Figure 1b, generated using data set 2, looks very similar to 
Figure la even though its training data was based on a larger set of load profiles. 
The most notable improvements are the removal of the artifact near the bottom 
of the discharge field and a slight widening of the possible discharge zone on 
the top. The charging side however shows no improvement and instead exhibits 
a slight expansion of the charging power restriction in the top right corner. 


Figure Ic shows the results of the model fitted with data set 3, generated 
with the ANN using randomized training data. It produces by far the most 
accurate representation of the target properties. The largest improvement can 
be seen on the discharge side, that now, while still overestimating the discharge 
power restriction, shows a much more plausible range of usage. A similar 
improvement is visible on the charging side where the restricted zone has been 
significantly reduced and now closely resembles the restriction shown by the 
target unit. Besides the estimated power restrictions another modelling error is 
visualised in the heatmaps. 


All 3 models exhibit a clear overestimation of the battery efficiency. This 
overestimation is more severe during discharge processes. This error would 
cause the model SoC to drift away from the actual SoC over time and thus 
require frequent refreshing with a measured value. 
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5 Conclusion 


The paper proposed a data driven approach to fit generic models to the target 
generation or storage units. The approach was tested on the example of a 
vanadıum redox flow battery system and further evaluated on its sensitivity 
regarding the required training data. The results show the best performance can 
be achieved using randomized load profiles and initial SoC’s for the training 
data generation instead of historic load profiles of the target unit. This has some 
positive implications for the transferability of the approach on different target 
units of the same type. While the generation of training data using historic 
load profiles would have required a new set of training data and retraining 
of the ANN for each target unit, the randomized approach could allow the 
usage of one pretrained ANN for all target units of the same type, no retraining 
necessary. This would also justify putting more resources into the training of 
the ANN since it would be possible to not only create a library with generic 
models for each unit type but also create a library of pretrained, ready to use, 
ANN to fit them to the desired target. Thus significantly reducing the expertise 
required from users. Further research is required to determine if a single ANN 
can really fit any unit of a single type regardless of its scale and to test if other 
ANN architectures could offer higher accuracy. There is also the possibility 
that completely random load profiles might not be the best approach to training 
data generation since some of the load profiles might end up being nonsensical 
and their resulting data devoid of useful information for training. 
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Kurzfassung 


Es wird eine allgemeine Entwurfsmethode für Heuristiken zur Lösung von 
Problemen der Regelungstechnik vorgestellt. Diese Methode ist mit nur sehr 
geringen Kenntnissen der Regelungstechnik benutzbar. Mittels diesem Beispiel 
werden die Vor- und Nachteile einer Verwendung von KI-Methoden in den 
Ingenieurswissenschaften erörtert. 


1 Einführung 


Wir betrachten wohlgestellte Probleme, bei denen alle Daten zur Bestimmung 
einer Lösung mit der Aufgabenstellung gegeben sind und fragen: 

„Was macht ein Problem schwierig?“ Man findet Versuche in der Literatur, die 
Schwierigkeit von Problemen zu definieren [1], aber diese Versuche erscheinen 
in der Praxis wenig sinnvoll, da das Lösen von Problemen verschiedenartige 
Fähigkeiten erfordert. Im Bild 1 sind diese Fähigkeiten dargestellt: 


1. Das Problem muss aus seiner umgangssprachlichen, praktischen Formu- 
lierung in eine formale Sichtweise überführt werden. 


2. Die Berechnung des Gesuchten erfolgt aus der formalen Sichtweise. 


Proc. 33. Workshop Computational Intelligence, Berlin, 23.-24.11.2023 61 


Q z (1) Formale Sichtweise g 
X=AX+Bü Lösungs- 
berechnung (1) 


X(ta) = Xo, X(te) = Xe 
| Lösung 


t, 
Xp f \lalae = minimal 
ta 


Bild 1: Schritte beim Problemlösen 


Die formale Sichtweise stellt einen Verstehenshorizont dar, innerhalb dem dann 
Berechnungen möglich werden. Das Problem der Suche nach einer Steue- 
rungsfunktion, die X, in X, überführt, wird so in ein mathematisches Problem 
überführt, das im Rahmen der Matrizenrechnung gelöst werden kann. Es ist 
klar, dass Fähigkeit (2) durch Rechnerunterstützung erheblich erhöht wird und 
heute vielfach ganz dem Computer übertragen werden kann (vgl. [2]), aber 
die Ziele der Künstlichen Intelligenz (KI) sind wesentlich weiter, es soll auch 
Fähigkeit (1) erweitert werden. Da der Mensch immer derjenige bleibt, der dem 
Computer, in einer diesem verständlichen Weise, sagen muss, was zu tun ist, 
wird er niemals vollständig vom Computer ersetzt werden, die Kommunikation 
mit dem Computer ist dagegen vereinfachbar. Mittels einer Bereitstellung von 
Berechnungsleistungen für sehr allgemeine Sichtweisen macht es die KI 
möglich, dass der Mensch spezielle Einzelheiten seiner Fragestellung nicht 
mehr erfassen muss und diese, in einer für ihn sehr einfachen Form, an den 
Computer übergibt. Dadurch soll es dem Menschen ermöglicht werden, ohne 
tiefere Fachkenntnisse auch kompliziertere Aufgabenstellungen zu lösen. 


In unserem Beitrag soll dieses „Versprechen der KI“ an Aufgabenstellungen 
aus der Regelungstechnik erprobt werden. Dafür wird in Kapitel 2 eine Über- 
führung einer großen Klasse von Aufgaben der Regelungstechnik in Suchpro- 
bleme vorgestellt. Nach einer Diskussion der Schwierigkeiten beim Lösen von 
Suchproblemen in Kapitel 3, werden dann in Kapitel 4 Listen von Prinzipien 
zum schnellen Lösen von Suchproblemen und zur Koordinierung der Prozes- 
soren bei Mulit-Prozessor-Computern vorgestellt. Mittels diesen Listen und ei- 
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nem in Kapitel 5 dargestellten Schema zur Konzipierung eines Suchverfahrens 
ist es dann Studierenden, ohne tiefere Kenntnisse aus der Regelungstechnik 
möglich, Aufgaben dieses Fachgebiets zu lösen. Die Vor- und Nachteile dieser 
Methode gegenüber den klassischen Lösungsverfahren können damit erkannt 
werden. Die KI-Methode hat insbesondere das Problem der Validierung. Die- 
ses Problem wird in Kapitel 6 diskutiert, mittels eines Vergleichs der darge- 
legten Methode mit ChatGPT. Kapitel 7 resümiert die Ideen zur Konzipierung 
eines allgemeinen Problemlösers. 


2 Umformung eines allgemeinen Problems in ein 
Suchproblem 


Mit der Methode der Generalisierung von Fragestellungen, lassen sich die Auf- 
gabenstellungen der Regelungstechnik vereinheitlichen. Da diese Aufgaben- 
stellungen mathematisch wohlgestellte Probleme sein müssen, erfüllen sie die 
folgende Definition: 


Definition: Ein Problem heißt mathematisch wohlgestellt, wenn folgende 
Fragen beantwortet werden können: 


1. Wo wird gesucht? — S Suchraum 
2. Was wird gesucht? —> P(s) Beschreibung des Gesuchten 


Da für Probleme der Regelungstechnik der Suchraum S und die Beschreibung 
des Gesuchten, das Prädikat P(s), bekannt sind, ist eine Überführung in ein 
Suchproblem möglich. 


Im Bild 2 sind in der ersten Spalte die Problemtypen und in der zweiten Spalte 
die zugehörigen Spezifikationen von S und P(s) angegeben. Die dritte Spalte 
zeigt die vereinheitlichende Schreibweise, die alle diese Probleme erfasst. Ins- 
besondere bei Mehrgrößensystemen wird die regelungstechnische Lösung sehr 
komplex. 


Das Suchproblem soll mit einem einheitlichen Lösungsprozess, ohne spezi- 
elle Fachkenntnis aus der Regelungstechnik, gelöst werden. Im nächsten Ab- 
schnitt, wird zunächst die Schwierigkeit dieses Problems untersucht. 
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Regelungstechnik: S = Funktionen i(t) 
Steuerungsproblem mitte [ta, te] 
z.B. Treppenfunktionen, 
Polynome, Splines, ... 
P:ü(r) überführt Xp in X, 
und Nebenbedingungen 
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Bild 2: Überführung von Aufgaben der Regelungstechniker in ein Suchproblem 


3 Uber die Schwierigkeit von Suchproblemen 


Eine Problemlösung wird schrittweise erreicht. Bild 3, stellt das Problem der 
„Suche nach dem Wirtshaus“ dar. 


Zunächst könnte das Wirtshaus irgendwo sein, und es müsste der gesamte 
Suchraum danach abgesucht werden. Gibt es aber wegweisende Schilder, so 
müssen wir nur die, in unserem lokalen Sichtbereich entdecken und können 
somit das Problem einfach lösen. Wir definieren: 


Definition: Probleme, die mit einer lokalen Sichtweise gelöst werden kön- 
nen, heißen einfache Probleme, Probleme zu denen keine lokale Sichtweise 
existiert, heißen schwierige Probleme. 


Um diese Definition zu rechtfertigen, überlegen wir und kurz den Rechenauf- 
wand zur Lösung eines Optimierungsproblems. Ohne jede Problemvereinfa- 
chung müssten wir jedes Element aus $ mit jedem anderen Element verglei- 
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j Or 
JE 


Bild 3: Die Suche nach dem Wirtshaus 


chen, was bei N Elementen in S eine Größenordnung von exp(N) Vergleichen 
ergibt. Kann die Anzahl der Vergleiche aber auf kleine Untermengen von S mit 
u Elementen eingeschränkt werden, so sind hierfür nur größenordnungsmäßig 
exp(u) Vergleichsoperationen notwendig, was eine Rechenzeitvereinfachung 


exp(n) 


um den Faktor a) ergibt. 


Wir erkennen somit, dass unsere Definition konzeptionell der Unterscheidung 
zwischen P-schwierig und NP-schwierig entspricht. Der Vorteil unserer Defi- 
nition ist jedoch, dass diese den folgenden Satz beweisbar macht: 


Satz: Es gibt schwierige Probleme. 


Beweis. Die Unlösbarkeit des Consensus Problems (vgl. [3]) zeigt, dass nicht 
jede globale Zwangsbedingung durch lokale Zwangsbedingungen ersetzt wer- 
den kann. Daher ist es nicht immer möglich, eine globale Kennzeichnung des 
Gesuchten in eine lokale Kennzeichnung umzuformen, die bereits in einem 


lokalen Bereich erkennbar wäre. 


Dass ein Problem schwierig ist, erkennen wir daraus, dass es widersprüchli- 
che, lokal unvereinbare Lösungsstrategien gibt. Im Bild 4 ist Verpackungs- 
Problem dargestellt. Für das Verpackungs-Problem gibt es zwei Strategien, 
die sich gegenseitig widersprechen und die nicht in einer einzigen lokal formu- 
lierbaren Anweisung vereinigbar sind: 


Strategie (1): Fülle jedes Paket möglichst vollständig! 
Strategie (2): Verteile die großen Schachteln zuerst! 


In analoger Weise haben wir für das Suchproblem die unvereinbaren Strategi- 
en: 
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Bild 4: Verpacke die oberen Schachteln in möglichst wenigen der unten gezeigten Standard-Pakete 


(1) Exploitation: Nutze Dein Wissen, um nicht alles absuchen zu müssen! 
(2) Exploration: Berücksichtige alle Teile des Suchraums! 
Für schwierige Probleme gilt das 


„No Free Lunch“ Theorem: Zu schwierigen Problemen existiert nicht ein 
bestes Lösungsverfahren für die gesamte Problem-Klasse. Ein jeweils bestes 
Verfahren muss an das spezielle Problem angepasst sein. 


4 Effiziente Suchverfahren 


4.1 Schnelle Ein-Agenten-Suchverfahren 


Um effiziente Heuristiken zu erstellen, benötigen wir Prinzipien, mittels denen 
wir die Suche schneller machen können. Dazu müssen wir einen Grund in 
der Problemstellung finden, der ein Prinzip zur Suchbeschleunigung möglich 
macht. Alle diese Prinzipien können wir in einer, im folgenden angegebenen, 
Liste zusammenstellen. Die Vollständigkeit dieser Liste ist natürlich nicht ma- 
thematisch beweisbar, sondern nur argumentativ begründbar. 


e Eine Untersuchung aller uns bekannten Heuristiken ergab keine weiteren 
Prinzipien. 


e Das Argument „Was sonst?“. Man kann überlegen, welche Möglich- 
keiten zur Beschleunigung einer Suche prinzipiell bestehen und daraus 
erkennen, dass keine weiteren Prinzipien denkbar sind. 


In der Liste sind angegeben: 
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e der Grund, der beim Problem erfüllt sein muss, damit das Prinzip an- 
wendbar ist, 


e das Lésungsprinzip 
e und Beispiele von Verfahren, in denen das Prinzip realisiert ist. 


Um die Liste zur Konzipierung von Ein-Agenten-Suchverfahren zu verwen- 
den, muss der Anwender zunächst prüfen, ob der Grund für die Anwendung 
eines Prinzips in seinem speziellen Such-Problem erfüllt ist. Falls dies bejaht 
wird, so kann er das entsprechende Prinzip mittels einer Unterroutine, die 
dieses realisiert, in seine Heuristik aufnehmen. Am einfachsten ist es, ein in den 
Beispielen angegebenes, realisiertes Verfahren zu kopieren und in die eigene 
Heuristik einzubauen. Eine Übersicht über alle Methoden, die in der Literatur 
angegeben werden, ist hier natürlich nicht möglich. Die Anzahl der Beispiele 
könnte beliebig ergänzt werden. Weiter ist zu beachten, dass in vielen Verfah- 
ren oft auch mehrere Prinzipien realisiert sind, d.h. eventuell auch solche, die 
beim vorliegenden Problem keine Verringerung der Suchzeit bewirken. 


Prinzipien schneller Lösungsverfahren: 


e Es gibt eine Nachbarschaftsstruktur N(s) für s € S auf dem Suchraum, 
die zum Problem passt. 
=> Prinzip: „Suche das Bessere nahe dem Guten!“ 


Beispiele: Gierige Heuristik (Gradientenverfahren), Nachbarschaftssu- 
che, variable Nachbarschaftssuche. 


e Es gibt eine Bewertung der Elemente s € S bezüglich Ihrer Lösungsqua- 
lität, ohne dass P(s) explizit berechnet werden muss. 
=> Prinzip: „The most promissing first!“ 


Beispiele: A*-Algorithmus, Fred Glovers Methode: Ändere den bisheri- 
gen Lösungsweg an der Stelle der vielversprechendsten Alternative! 


e Es gibt Teile des Lösungsvorschlags s € S, die besser bestätigt sind als 
andere. 
=> Prinzip: „Lass Sicheres ungeändert und ändere Unsicheres.“ 
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Beispiele: Backbone-Methode, No-Goods-Methode, Ameisenverfahren, 
Kernal Search, Auswahlverfahren der Nachbarschaftsstruktur N;(s) bei 
der variablen Nachbarschaftssuche. 


Vermeidung von Doppelt-Durchsuchungen. 
=> Prinzip: "Ordnung hat, wer weiß, wo er erst gar nicht suchen 
muss!" 


Beispiele: TABU-Search, Branch & Bound-Methode. 


II:seS? P(s) mit s = (51,52,.::5K) E S1 X SEX... X SKES 
ist darstellbar durch die Teilprobleme: 
Ik : Sk E Sk? Pelsk) mit k = 1,2,...K 


=> Prinzip: „Zerlege das Problem in Teilprobleme“ 


Beispiele: Gauß-Algorithmus, Crossover-Operator beim Genetischen 
Algorithmus. 


II:se S? P(s) ist äquivalent zur Folge der Probleme: 

Il, :sı E $1? Pi (81), 

I2: 52 €S2? Plslsı),---, 

Ik : sx € Sx? Px (sx\s1,82, = .SK-1) 

=> Prinzip: „Verfolgung von Zwischenzielen!“ 

Beispiel: Schrittweise Approximationsmethoden. 

Tk: 5, E S? Ps) ,mitk=1,...K sei eine Folge von Problemen mit: 
(œ) Alle Il; haben den selben Suchraum S. 
(B) Pi, ist einfach lösbar. 
(y) Pi, ist ähnlich zu Pik41. 


=> Prinzip: „Homotopie-Suchverfahren“ 
Suche Lösung sı zulli, 

Suche Lösung s zu Ib in Umgebung N (sı), 
Suche Lösung s3 zu IL; in Umgebung N (s2), 


Suche Lösung sx zu IIx in Umgebung N(sx_1). 
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Tabelle 1: Koordinierungsstrategien 


Intensivierung Diversifikation 
Agenten lernen vom Verhalten Die Agenten teilen sich die 
anderer Agenten. Aufgabe auf, und verhalten sich 


möglichst unterschiedlich. 


e Die Menge der möglichen Lösungsverfahren ist klein. 
= Prinzip: „Überführung eines Problems in sein Meta-Problem“: 


II :seS? P(s) => 
In : Lösungsverfahren € Menge der Lösungsverfahren zu II? 


mit Lösungsverfahren löst II. 


4.2 Koordinierungsmethoden bei 
Multi-Agenten-Suchverfahren 


Multi-Agenten-Verfahren sind dann sinnvoll, wenn die Arbeit mehrerer Agen- 
ten koordiniert werden kann. Da bei Multi-Prozessor-Rechnern die Agenten 
auf die Prozessoren verteilt werden können, ergeben sich in diesem Falle große 
Erhöhungen der Rechengeschwindigkeit. Das Ziel der Koordinierung sind 
die beiden Strategien in Tabelle 1. Diese Strategien können, falls die entspre- 
chenden Voraussetzungen zu den Regeln auf dem betrachteten, speziellen Pro- 
blem erfüllt sind, in der Heuristik zur Koordinierung der Agenten eingesetzt 
werden (Tabelle 2). 


5 Ein generelles Lösungsverfahren für die 
Probleme der Regelungstechnik 


In diesem Kapitel wird zunächst ein Ansatz vorgestellt, mittels dem die im vor- 
herigen Kapitel bereitgestellten Prinzipien und Regeln zur Konzipierung eines 
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Tabelle 2: Strategien der Koordinierung 


Koordinierung von Agenten Verfahren die diese realisieren 


„Lerne vom Besten!“ Partikelschwarm-Optimierung 


„Lerne beim Test jedes Agenten zur Nelder-Mead-Verfahren 
Verbesserung der gesamten 
Agenten-Gruppe“ 


„Bilde das Bessere aus guten Genetische Algorithmen 
Teilen“ 


„Lass viele Gruppen versuchen ein Neuronale Netze 
Problem zu lösen und wähle die 
effizienteste Gruppe aus“ 


„Eliminiere ineffiziente Agenten Neustart-Methode, Identifikation 

unnd füge neue Agenten in von vielversprechenden Gebieten 

effizienter Weise ein“ in S. 

Kombination vieler Regeln Fred Glovers „Multi-Agenten- 
Suchverfahren“ 


Lösungsverfahrens für Suchprobleme realisiert werden können. Mit diesem 
Ansatz konnten Studierende Erfahrungen sammeln und regelungstechnische 
Probleme lösen. Dies ermöglicht einen Vergleich der verwendeten KI-Methode 
mit den klassischen Lösungsansätzen. 


Als Ausgangspunkt wurde angenommen, dass die Studierenden praxisnah, ein 
einzelnes anfallendes Problem möglichst schnell lösen sollten, d.h. ohne tiefere 
Einarbeitungszeit in die zugehörige Mathematik. Zur Lösung wird die hier 
eingeführte Methode, mittels der Anleitung von Abschnitt 5.1, verfügbar ge- 
macht. Die Aufgabe gilt als erfolgreich gelöst, wenn eine geeignete Lösung 
zum Problem gefunden wurde. Eine darüber hinausgehende Bewertung soll 
aus den folgenden Gründen nicht erfolgen: 
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1. 


5.1 


In der industriellen Praxis ist es in der Regel nicht von Interesse, nach- 
zuweisen, dass eine bereits gefundene Lösung auch einfacher auffindbar 
gewesen wäre. 


. Eine Bewertung des Lösungsverfahrens hängt, wegen des 


„No Free Lunch“-Theorems, immer auch von dem oder den betrachteten 
Problemen ab. Die in einer solchen Bewertungsaussage vorkommenden 
Begriffe: Schnelligkeit, Zugehörigkeit zur Klasse des speziellen Pro- 
blems, Erfolgssicherheit des Lösungsansatzes und einige weitere, sind 
nur schwer fassbar. Es ist, zum Beispiel für SAT-Probleme, einer Unter- 
menge der Klasse der Suchprobleme, bekannt, dass die besten Löser für 
schwierige Probleme nicht optimal für industrielle Probleme sind. 


. Da die Validierung ein eigenes Gebiet ist, das spezielle Betrachtungs- 


weisen erfordert, soll darauf erst im Kapitel 6 eingegangen werden. In 
Kapitel 5 beschränken wir uns bei der Validierung auf den Nachweis der 
Korrektheit der erhaltenen Lösung. 


Konzipierung eines Suchverfahrens 


Gegeben ist ein Problem II: se S? P(s) und Zusatzwissen. 


(1) Prüfe, ob Voraussetzungen für schnelle Suchverfahren erfüllt sind, indem 


die folgenden Fragen beantwortet werden: 


Gibt es Umgebungsstrukturen auf S, die zum Problem passen? 

Gibt es „vielversprechende Elemente“ in S? 

Gibt es Kriterien, um Gebiete von S auszuschließen? 

Gibt es Wichtiges und Unwichtiges bez. S oder bez. seiner Elemente? 
Bestehen die Elemente s € S aus mehreren Teilen? 

Sind „sichere oder gute Teile“ in Elementen s € S erkennbar? 

Ist das Problem in mehrere Teilprobleme zerlegbar? 


Gibt es Zwischenziele, die angestrebt werden können? 
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e Kann das Problem aus der „Verbiegung“ eines einfachen Problems er- 
halten werden? 


e Sollten die Agenten eine Gruppe bilden um größere Bereiche zu überbli- 
cken? 


(2) Wähle einige der Methoden zu den Prinzipien aus, deren zugehörige Test- 
frage in (1) mit „ja“ beantwortet wurde. 


(3) Entscheide, ob mehrere Agenten zur Realisierung der Methoden von (1) ko- 
ordiniert werden können. Falls ein Multi-A genten- Verfahren sinnvoll erscheint, 
wähle die Regeln für die Koordinierung. 


(4) Realisiere die ausgewählten Methoden in einer Heuristik. 
(5) Validiere die erhaltene Heuristik. 


Bemerkung: In (1) bis (3) werden die Voraussetzungen der Prinzipien ge- 
prüft, (4) erfordert softwaretechnische Realisierungsschritte, die in (5) über- 
prüft werden. Die Prinzipien ergeben eine „grobe“ Charakterisierung der ver- 
wendeten Verfahren. Die Effizienz der Heuristik hängt auch stark davon ab, 
wie gut die einzelnen Verfahren in der Heuristik realisiert sind. 


5.2 Anwendung und Test der Methode zur Konzipierung 
von Heuristiken 


Um dem Leser einen Eindruck zu vermitteln, welche Aufgabenstellungen durch 
die Studierenden gelöst wurden, werden diese im Folgenden kurz vorgestellt. 
Alle Aufgaben konnten innerhalb der Zeit, die für eine Masterarbeit verfügbar 
ist, gelöst werden. Auf eine weitere Bewertung des Lösungsverfahrens wurde, 
wegen der oben angegebenen Schwierigkeiten, verzichtet. 


5.2.1 Polvorgaberegler für ein lineares Multi-Input-System 


Aufgabe: Gesucht ist eine Matrix aus dem Suchraum S = {F € R”*P } mit: 


pa(A) = det(AI — (A+ BF)) hat die Nullstellen: A1,A2...Ay (2) 
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und der Nebenbedingung ||F||2 minimal. 

Lösungsansatz: Das Problem ist sehr komplex, da der Suchraum sehr viele 
lokale Optima hat. Es wurden Tests mit verschiedenen Topologien über dem 
Suchraum durchgeführt. Schließlich konnten für Systeme bis zur Dimension 6 
mit dem Nelder Mead-Verfahren befriedigende Ergebnisse erzielt werden. Da 
das Nelder Mead-Verfahren eine ganze „Wolke von Agenten“ über dem Such- 
raum koordiniert, wobei die Bewertung jedes einzelnen Agenten die gesamte 
Wolke beeinflusst, erschien diese Methode passend zu sein, zum Problem der 
sehr vielen lokalen Optima. 

Lösungsumsetzung: Mit den Bezeichnungen: 

Yip a(F)AK :=det(AT- (A+BF)) und £2) aA* :=TU_ (A — Ag) 

wird der Wert œo = Yit-9(ax(F’) — 4x)” definiert. 

Die Heuristik minimiert & + p||F || wobei p mit ansteigender Suchzeit ver- 
kleinert wird. Die Suche gilt als erfolgreich, falls im Rahmen der Suchzeit 
Qo <E= a erreicht werden konnte. 


5.2.2 Multi-Input-Steuerungsproblem 


Aufgabe: Gesucht wird eine zeitdiskrete Steuerungsfunktion (u1(k),u2(k)), 
die das im Bild 5 gezeigte Fahrzeug mit den in (3) angegebenen Systemglei- 
chungen vom Zustand Xo in den Zustand X, überführt. 


xı(k+1) xı(k) + Ta(x4(k) -cos(x3(k))) 

x2(k+1) x2(k) + Ta(x4(k) - sin(x3(k))) 

x(k+1l) | = x3(k) + Ty -x5(k) (3) 
xa(k+ 1) xa(k) + Ta: u1 (k) 

xs(k+ 1) xs(k) + Ta -u2(k) 


Lösungsansatz: Eine Steuerfunktion besteht aus mehreren Abschnitten, daher 
wurde die Lösung mit dem Prinzip der Zerlegung in Teilprobleme gesucht und 
ein Lösungsansatz mit einem genetischen Verfahren gewählt. 
Lösungsumsetzung: Da die Teilabschnitte von unterschiedlicher Wichtigkeit 
sind, wurden die Operatoren des Genetischen Algorithmus um Zusatzfunktio- 
nen erweitert, die weitere Prinzipien realisieren (Tabelle 3). 
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Bild 5: Zu steuerndes Fahrzeug 


Wird den Routinen, welche die Prinzipien realisieren, eine gewisse Wirkungs- 
stärke und ein gewisser Anteil an der Gesamtrechenzeit der Heuristik zugeord- 
net, so kann aus dem Vergleich der Größe dieser Werte mit der verbleibenden 
Fehlergröße auf die Relevanz des Prinzips geschlossen werden. 


5.2.3 Single-Input Steuerungsproblem 


Aufgabe: Gesucht ist eine Steuerfunktion u(t), die für das im Bild 6 darge- 
stellten Dreitanksystem mit den Systemgleichungen (4) xo in X, überführt. 


at) =-p Vat Eule) 


ž2(t) = —p V x2(t) +p vV x1 (t) (4) 
43(t) = -—pV3(1) +p Vx2(t) 


Lösungsansatz: Da eine Linearisierung x(t) = AX + bu in ¥ = 0 bestimmt 


werden konnte und da das System, wie Gleichung (5) zeigt, stetig vom linearen 
System in das nichtlineare System überführt werden kann, wurde eine Lösung 
mit der Homotopie-Suchmethode ermittelt. 


X(t) = Z) + bult) +A(f (X(t), u(t)) —AxX(t) + bu(t)) (5) 


-p Va) + Zul) 
mit Nu) = | pm) +p Vn) (6) 
-p Va) +p Vl) 
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Tabelle 3: Zusatzfunktionen der Operatoren des Genetischen Algorithmus 


Operatoren Zusatzprinzipien 


Crossover-Operator Sichere Gene werden mit hoher 
Wahrscheinlichkeit erhalten. 
TABU-Liste (vermeidet zu große 
Ähnlichkeit der Elemente). 
Effiziente Agenten werden öfter 


berücksichtigt. 
Mutation Variable Mutationsstärke 
entsprechend der Effizienz des 
Agenten. 
Agenten-Elimination Löschen schlechter Agenten. 


Tendenz-Abschätzung (Agenten 
mit schlechter Prognose werden 
eliminiert.) 


Neue Agenten Diese werden in bisher unbesetzten 
Gebieten erzeugt. 


Lösungsumsetzung: Zunächst wurde eine Steuerfunktion für das zugehörige 
lineare System bestimmt und dann die Homotopie-Suchmethode durchgeführt. 
Das Ergebnis für den erreichten Vektor X(f,) zeigt eine hinreichend gute Über- 
einstimmung mit dem Soll-Wert X., siehe (7). 


7,04223167 7,21634809 
Xe = | 6,07614977 X(te) = | 6,13234234 (7) 
5, 1288758 5,07650415 


5.3 Methoden-Vergleich 


Tabelle 4 zeigt die Vor- und Nachteile der angewandten Methode gegenüber 
den klassischen Verfahren. Das größte offene Problem bei der Entwicklung 
eines generellen Problemlösers mit KI-Methoden ist die Durchführung einer 
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Bild 6: Nichtlineares Dreitanksystem 


Validierung. Auf dieses Problem soll im Kapitel 6 genauer eingegangen wer- 
den. 


6 Validierung 


6.1 Anwendung des generativen Sprachmodells ChatGPT 


Um das Problem der Validierung zu veranschaulichen, wird ein Vergleich des 
betrachteten Verfahrens mit dem Sprachmodell ChatGPT durchgeführt. ChatGPT 
wird zur Zeit in der KI-Gemeinde intensiv diskutiert und mancher KI-Forscher 
stellt sich die Frage, ob seine Forschung noch aktuell ist, oder ob er sich 
auf dieses neue Gebiet umstellen sollte. Auch Fachdidaktiker der Mathematik 
nehmen ChatGPT zur Kenntnis und stellten diesem die Frage [4]: 


Frage an ChatGPT: Max ist 78 cm groß, Wenn er auf eine 20 cm 
hohe Kiste steigt, ist er genauso groß wie Klaus, Paul ist 15 cm 
kleiner als Klaus. Wie groß ist Paul? 


Sie erhielten zunächst die falsche Antwort von ChatGPT: „Paul ist 43 cm 
groß.“ 
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Tabelle 4: Vor- und Nachteile der Verfahren 


Klassische Verfahren KI-Verfahren 

Wissen aus der Regelungstechnik Problemlösung ist bei geringem 

ist notwendig. Wissen aus der Regelungstechnik 
möglich. 

Jedes Problem erfordert eine Alle Probleme sind mit einer 

spezifische Lösung. einzigen Lösungsanleitung lösbar. 

Die entwickelten Regler erfordern Die entwickelten Regler erfordern 

geringe Rechenzeit und wenig hohen Rechenaufwand und viel 

Energie Energie. 

Der Datenaufwand ist gering Der Datenaufwand kann hoch sein, 


bei einem groBen Suchraum. 


Die gefundene Lösung ist direkt Die gefundene Lösung erfordert 
einsetzbar und hat die gewünschten eine ausgiebige Validierung. 
Eigenschaften. 


Der Zusatz zur Frage an ChatGPT: „Lass uns Schritt für Schritt vorgehen.“ 
Ergab jedoch die richtige Antwort: Paul ist 83 cm groß. 


Die Frage ist nun: Wie erhält man aus der Validierung eine Erklärung des 
Fehlers, die eine Verbesserung der Heuristik möglich macht? 


Als Nicht-Fachdidaktiker würde man nicht auf die Idee kommen, den obigen 
Zusatz zu geben, da angenommen würde, dass jemand, der überhaupt eine 
Antwort findet, diese immer nur schrittweise finden kann. 


Das Problem von ChatGPT ist, dass es als Wissens-Hintergrund das Internet 
verwendet, das aus einer völlig unstrukturierten und auch häufig widersprüch- 
lichen Informationsflut besteht. Bezüglich dieses Wissens-Chaos ist keine ziel- 
gerichtete Fehler-Grund-Suche möglich! 


Dass die KI Erklärungen für ihre Ergebnisse liefern sollte, ist seit über 25 
Jahren ein Thema und die Aufgabe einer „Explainable Artificial Intelligence“. 
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In der neueren KI-Literatur findet man viele Hinweise darauf, dass in diesem 
Zweig der KI Fortschritte wesentlich langsamer erhalten werden, als dies zu- 
nächst erhofft wurde. In speziellen Anwendungsfällen steht jedoch die Mög- 
lichkeit einer zielgerichteten Validierung zur Verfügung. 


6.2 Zielgerichtete Validierung 


Die Methode, mit der in unserem Fall eine Validierung durchgeführt werden 
kann, ist der „Means-end account of Explainable Artificial Intelligence“ [5]: 
Eine zielgerichtete Ermittelung der Nicht-Korrespondenzen zwischen dem 
Problem und der Heuristik. Wie in Bild 7 dargestellt ist, können wir nicht 
nur Tests auf die üblichen Fehlerarten (Bias-Fehler, Overfitting und Overtrai- 
ning usw.) durchführen, sondern auch gezielt die Effizienz der in der Heuristik 
realisierten Prinzipien und Koordinierungsregeln testen (analog, wie dies von 
Studierenden im Beispiel in Kapitel 5.2 durchgeführt wurde). Je nach Test- 
ergebnis kann dann die Wirkung eines Prinzips verstärkt oder abgeschwächt, 
bzw. ein Prinzip durch ein anderes ersetzt werden. 


Da unsere Prinzipien- und Koordinierungslisten endlich sind, könnten diese 
(zur Zeit noch manuell ausgeführten) Tests automatisiert werden. Die Gül- 
tigkeit eines Prinzips oder einer Regel, (mit den zugehörigen Parametern,) 
kann als Hypothese aufgefasst werden, deren Gültigkeits-Wahrscheinlichkeit 
von einem Bayes-Schätzer mit den Ergebnissen aus der Validierung berechnet 
wird. 


7 Schematische Darstellung eines allgemeinen 
Problemlösers 


Wir gehen von einem Problem aus und machen uns zunächst allgemeine Über- 
legungen. In unserem Fall haben wir mit diesen Überlegungen einen Verste- 
henshorizont verfügbar gemacht, der von den Listen der Prinzipien und Koor- 
dinierungsregeln gebildet wird. Mit dem Verstehenshorizont können wir dann 
die Prinzipien zur Bildung eines Lösungsverfahrens zusammenstellen, dessen 
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Validierung Anpassung 
N 5 der Heuristik 
E Bias-Fehler? an den Fehler 


Overfitting-Fehler? 
Overtraining-Fehler 


Heuristik 


Prinzipien 


Parameter 


Test der Prinzipien Erklärung 


Test der Parameter des Fehlers 


Bild 7: Validierung einer Heuristik, für den hier konzipierten Lösungsansatz. 


Formalisierung eine Lösungs-Heuristik zur Berechnung einer Lösung liefert 
(vgl. [6] bezüglich einer analogen Vorgehensweise.) 


Die Validierung der Lösung vor dem Hintergrund der verwendeten Prinzipien 
und Koordinierungsregeln ergibt eine Erklärung des Fehlers und damit die 
Möglichkeit zur Verbesserung der Heuristik. Bei allgemeinen KI-Systemen 
wie ChatGPT (oder auch z.B. Fahrerassistenz-Systemen) steht die Validierung 
zur Zeit vor dem Problem eines unerfassbaren und unstrukturierten Verstehens- 
horizonts, bezüglich dem der „Begriff der Erklärung“ nicht spezifizierbar ist. 
Der Attention-Mechanismus von ChatGPT erzeugt erst dessen Verstehenshori- 
zont, so dass dieser dem Validierungs-Prozess nicht verfügbar ist. Die Validierungs- 
Aufgabe ist somit, im Gegensatz zu unserem Ansatz, mathematisch nicht voll- 
ständig spezifiziert. Analog zur „Strukturierten Programmierung“ bei der über 
die strikt mathematische Ebene der Programme eine argumentative Ebene ge- 
setzt wird, welche die Einhaltung von Ordnungsprinzipien garantiert, wurde 
hier, oberhalb der Ebene der reinen Heuristiken, eine argumentative Ebene zur 
Konstruktion dieser Heuristiken eingefügt, um deren Auswahl und Organisati- 
on begründet durchführen zu können. 
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1 Introduction 


Nonlinear state space models are powerful model architectures for system iden- 
tification. They provide the necessary flexibility for the description of nonlin- 
ear dynamic processes while still maintaining in a quite compact respresen- 
tation. Typically, the data-driven state space models are black-box models, a 
fact that causes shortcomings regarding interpretability [9]. Since the modeling 
performance is satisfying and competitive to recurrent neural networks [5], we 
strive for an increase in interpretability. Interpretability in terms of physical 
insights can be achieved by the incorporation of prior process information. 
This leads to a gray-box identification strategy. Eventually, the goal of this 
contribution is to combine both modeling power and the interpretability of its 
parameters. 


Gray-box methods require prior process knowledge besides measurement data. 
Prior knowledge can be available in different forms. This work studies the 
case in which the process structure is a priori known in form of a physical 
equation. If a structured identification is carried out, the model is forced into 
an explainable form. Then, both the modeling task and the interpretability 
problem are solved. The central idea of the proposed method is the reduction 
of the model parameters down to the physically required number. 
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An advantageous architecture for the structured identification is the Local Model 
State Space Network (LMSSN), developed by Schüssler [6]. Its local linear 
behavior supports the desire for interpretability, because the well-known foun- 
dations of linear system theory can be applied. 


2  Gray-Box Modeling with the Local Model State 
Space Network 


LMSSN is an extension of the linear time-discrete state space for modeling 
nonlinear processes. Detailed information about LMSSN can be found in [6]. 
On the one hand, it can be seen as a state space framework in which the 
multidimensional nonlinear functions in the state and output equations are 
implemented with two Local Model Networks (LMN) [1]. On the other hand, 
it can be seen as a deep neural network consisting out of one recurrent layer 
and a dense layer. Expressed with the above mentioned LMNs, a single-input 
single-output LMSSN of order ny with the input u(k), the state £(k) and the 
output f(k) is defined by 


&k+1)= ¥ (ol +A) + u(k))  Psrate,j(K) 
j=l 
Nout (1) 
(kK) = VP (pj + cj&(k) +du(k))- Bou, (k). 
j=l 


Here, oll is the offset of the state equation, al and pial can be interpreted 


as the slopes of the j-th model of the state equation. Accordingly, pi is 


the offset of the output euqation and re and di are the slopes of the j-th 
model of the output equation. (The superscript (-)!@ marks the discrete-time 
description.) There are altogether ngtate superposed affine models in the state 
equation network and nout in the output equation network. The basis functions 
® ;(k) express the j-th local validity. They are realized with normalized Gaus- 
sians and generate a global nonlinear function by superposition of the local 
affine models. Due to the fact that these local state space models are fully- 
parameterized, LMSSN is a black-box model. 
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In the following, the steps of structured identification are presented. The pro- 
cedure requires a novel initialization technique for LMSSN models. Nor- 
mally, LMSSN is initialized as a linear state space model, before it is di- 
vided into serveral local models with the help of the Local Linear Model 
Tree (LOLIMOT) or the Hierachical Local Model Tree (HILOMOT) algo- 
rithm [1]. The initial model for this tree-construction algorithm is the Best 
Linear Approximation (BLA) of the process estimated from input-output data 
{u(kTy),y(kTo)}. It is generated via a subspace-based system identification 
method [2] and results in a linear fully-parameterized model. If the model 
is restructured and restricted, it leads to a linear gray-box model. With the 
help of LOLIMOT, a nonlinear gray-box model is then derived by splitting 
the extended input space ñ = [%,u]' and adding local models. Note that the 
structure of the initial model is kept as splitting progresses and it is passed 
to the local models generated by LOLIMOT. The restructuring step will be 
described more closely in the following. Figure 1 shows the workflow for 
gray-box structured identification. 


Canonical state space forms like the Canonical Controllable Form (CCF) are 
easy to achieve via similiarity transformations [8]. An arbitrary gray-box struc- 
ture requires a more sophisticated restructuring strategy because the transfor- 
mation can not be calculated directly. Therefore, three possible methods are 
stated. A nonlinear unconstraint optimization with an additional penalty term, 
called Penalty Method (PM) [7] can force the free parameters to their desired 
values. Alternatively, a classical gray-box technique is the Prediction Error 
Method (PEM) [6]. It uses hard constraints for implementation of the known 
parameters @.,, by placing them in the model. Here, the fully-parameterized 
black-box model based on BLA is only necessary for initialization. Another 
alternative is to estimate a specific transformation matrix that leads to the 
desired state space form [4]. 


After the restructuring step, only the free parameters will be optimized while 
the constrained parameters are kept "frozen". In the following step, LOLIMOT 
generates a nonlinear global model by partitioning a. Finally, the described 
procedure yields a nonlinear model containing the desired gray-box structure. 
As a post-processing step, the physical parameters can be extracted, which is 
useful for analysis and gives insights into process. 
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Figure 1: Structured identification procedure. The BLA yields the parameter vector Bratt: 
For restructuring the constraint parameters @.,,, and their indexes are used. The 


structured parameters are in the vector O w while the parameters of the nonlinear 
model are in Ostrnonlin: The parameter extraction decodes str nontin to the interpretable 


parameters O phys- 


3 Test Process 


The proposed structured LMSSN is demonstrated on simulated data from a 
mechanical system which is a moving body with a single degree of freedom. 
The system’s input u(t) is the excitation force acting on the center of gravity 


while the output y(t) is its position: 


My(t) + Dy(t) + FQ) = u(t). (2) 


=Fsr(eW-1) 
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Here, the body’s mass is M and the linear damping ratio is D. A static progres- 
sive spring curve Fs(y) leads to a nonlinear equation of motion. The stiffness 
characteristic is parameterized with the curve offset For and the exponential 
stiffness rate y. For the excitation of the system, a step-like signal containing 96 
events with random levels in the interval [0 N; 1 N] is chosen. The numerical in- 
tegration of the equation of motion, necessary for output calculation, is carried 
out with the Euler-forward method. After data generation, the output signal 
was artifically disturbed with additive white Gaussian noise with an signal-to- 
noise ratio of 49 dB. 


Next, we state the specific prior knowledge of the process (see (2)) required 
for gray-box identification. In the present case, the gray-box knowledge is the 
information that the process can be approximated by a second order system 
whose numerator equals one (PT, system!). This knowledge is applied to 
the model. A favorable structure for the above mentioned task is a Nonlinear 
Controllable Form (NCF). Compared to CCF, the initial NCF has an additional 
offset in the state equation. The NCF is written as 


0 1 0 0 
x(t) = x(t) + u(t) + A 
x re Ofree,2 x 1 ( ) | 
N = 
A( free) b (pre) (3) 
y(t) = [ess 0] 0] xt )+ fo | u (t)+ [o] : 
= 
E “Tne =a. P 


For the sake of completeness, the linear case relations are stated as 


D 
Ofree BIT 
, M 
i (4) 
Ofree,3 = 0, Ofree,4 = Mm 


Ofree, 1 ee M ’ 


Here, C = const. is the stiffness of a hypothetical linear spring. 


L A PT) system is a second order transfer function without zeros. 
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4 Parameter Extraction 


The final step is the interpretation of the model parameters in a physical man- 
ner, compare Fig. 1. Regarding the nonlinear equation, these are the body’s 
mass, damping ratio and stiffness. The parameter extraction is able to deliver 
the spring force Fs(y) as a function dependent on the position y. Since we 
modeled with local affine functions, we find Ê (y) as the weighted sum of num 
local affine stiffness functions Fatfne i), as 


NLM 


Fs (y) = £ Fattine,i(Y)®i(y) 
i=1 


nM \ 2 
= } (Ĝin y + Con) ®i(y). 
i=l 


i= 


(5) 


Here, Gini is the linear stiffness and Cott the offset of the i-th local stiffness 
model. With (3), we can extract Cini and Cott from the estimated model 
parameters 


Can a Cor. = — Ô; i- (6) 


Alternatively, the function 6, (y) which describes the variation of the parameter 
6, with y(t) can be constructed from (5) and (6) as weighted sum, 


NLM 


ôi) = YF ôD). (7) 
i=l 


The left side of Fig. 2 visualizes the extracted validity functions and ôi (y). 
Additionally, on the right is the true spring curve plotted, which has been fitted 
accurately by the LMN. 
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Figure 2: Local Linear Stiffness 


5 Conclusion 


This contribution emphasized that a nonlinear data-driven state space model 
with physically inspired structure is able to combine interpretability with high 
performance. Furthermore, computational resources can be preserved com- 
pared to an unstructured black-box model because the gray-box model contains 
less parameters that have to be optimized. 


We are able to demonstrate our apporach on a widely known but simple exam- 
ple process. Next, our gray-box method shall be expanded on more complex 
processes like the Bouc-Wen hysteresis benchmark. 
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Abstract 


This paper presents the derivation of the equations of motion of a 3-DOF 
gyroscope with a pendulum attachment through the Euler-Lagrange approach, 
followed by a conversion into a Takagi-Sugeno Fuzzy Model. First, suitable 
coordinate frames and generalized coordinates are defined, followed by the 
definition of the kinetic energy of each body frame of the gyroscope. Next, 
the kinetic and potential energy of the pendulum attachment is described. The 
derived equations of motion are then validated by simulation and compared 
to the behavior of a testbed system (see Figure 1). The conversion to the 
Takagi-Sugeno fuzzy model is done by a weighted combination of locally valid 
linear models. A set of adequate premise variables and membership functions 
represent the nonlinear system. Finally, a controller synthesis through parallel 
distributed compensation and LMIs satisfying local quadratic Lyapunov func- 
tions is conducted and validated by simulation. 


1 Introduction 


The aim of this paper is to derive and validate the equations of motion of a 
"Control Moment Gyroscope" (CMG) with 3 degrees of freedom, extended by 
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Figure 1: Testbed system 


a pendulum attachement, similar to the testbed used in [1]. For the derivation, 
the modeling approaches of the CMG from [2] and the modeling approach of 
the Furuta pendulum in [3] are combined. The derived equations of motion are 
augmented with friction terms and validated using measurement data from the 
real testbed. Subsequently, the developed equations of motion are transformed 
into a Takagi-Sugeno formulation, and a Parallel Distributed Compensation 
(PDC) controller is derived using Linear Matrix Inequalitys (LMIs) to enforce 
closed-loop-dynamic constraints. 


2 Methods 


2.1 Nomenclature 


The Notation in this Paper is as follows. Vectors are written italic with an 
underline x, Matrices A bold and scalars s italic. Furthermore, < and > in- 
dicate negative and positive definiteness, respectively. E indicates the identity 
Matrix. 
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N 
ey 


Figure 2: System sketch of the CMG Furuta Pendulum with 3 Degrees of freedom, the body 
frames, D,G,F,P and the reference frame N. For clarity the origins of the coordinate 
systems (except the Pendulum) are depicted with an offset. 


2.2 Modelling 


The CMG pendulum system in Figure 2 is described using five body frames, 
namely Disk (D), Gimbal (G), Frame (F), Pendulum (P), and the fixed refer- 
ence frame (N). The coordinate systems are defined using the normal vectors 
el with i = x,y,z as axes and j = D,G,F,P,N as the associated body frame. 
The reference system N is chosen to be stationary, so that relative movements 
of the other reference systems in N constitute to an absolute velocity. The 
torques acting on the body frames are denoted as 7, with n = 1,2,3,4 along 
the corresponding rotational axis following the right hand rule. 
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The modelling of the CMG part of the overall system follow [2]. Therefore the 
generalized coordinates are chosen as: 


ES 


= [gı; 92, 93; qa] 
qi= di, ån, 93; ġa] (1) 


G:= 1h, 4, 4%, Gal, 


whereas qı represents the disk position about e? 


Sya 


q2 represents the gimbal 
position about e, q3 represents the frame position about ef , and q4 represents 
the pendulum position about e”. The controllable inputs of the system are the 
torques Tı and T2. Disturbance torques are 73 and T4. All torques are combined 
into the vector 


t:=([U, T, T3, T4]! (2) 


The positive rotation directions of all coordinate systems follow the right-hand 
tule. Friction terms will be added at a later time. The center mass of the 
coordinate systems describing the CMG are assumed to be at the center of 
the disk. The mass center of the pendulum, denoted as mp, has an effective 
pendulum length of lp. The distance from the pendulum coordinate system’s 
rotation axis to the frame’s rotation axis is Li. The moments of inertia of 
the bodies around different coordinate axes are represented in tensor form to 
calculate the kinetic energies for the Lagrange function. 


J Dxx 0 0 J Gxx 0 0 
P=]|0 Jọ 0 |,I§=]|0 Jo 0 
0 0J 
Dzz 0 0 J Gz (3) 
J Fxx 0 0 J Pxx 0 0 
=. dee] Jẹ 0 
0 0 Jrg 0 0 Jz 


For the later description of the rotational positions of the bodies relative to 
each other rotation matrices are used. By combining the rotation matrices it 
is possible to represent all positions of the bodies in relation to the reference 
coordinate system. The rotation matrices R; around the coordinate axes i = 
x,y,z of the reference coordinate system are defined as follows: 
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1 0 0 


R.(gj) = |O cos(q;) -sin(q;) 


0 sin(g;) cos(q;) 


cos(qj) 0 sin(q;) 


R(q)=| 0 1 0 
—sin(g;) 0 cos(q; 


R.(qj) = | sin(qj)  cos(qj) 
0 0 


) 
cos(qj;) -sin(g;) 0 
0 
1 


(4) 


Here, the notation Ri is introduced, which describes the rotation of the coordi- 


nate system j with respect to i. 


Rp =R, (41) 
RG = R.(q2) 
RE = R.(q3) 
Rp = R,(q4) 


(5) 


By multiplying the rotation matrices accordingly, it is possible to describe the 


rotation of any desired coordinate system of the CMG pendulum relative to the 


reference coordinate system. 


RY = RYRËRS 


(6) 


With the rotation matrices, it is possible to represent the angular velocities of 


the different bodies in the reference coordinate system. 


a, = [0 OT ZN Org (7) 

ONG = (or xN ZN og (8) 

ON p = VO XF Zu 0” ]q (9) 
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with the rotational descriptions 


Zy =e 
XP = REN. (10) 
N N N 


The matrices describing the rotations in (7), (8), and (9) are hereafter denoted 
as IN p Here examplary for the disk. 


IND = YEXF Zh 0%!) (11) 


Additionally, the set of coordinate systems ./ = D,G,F is defined. From this, 
the kinetic energy Tom (4; q) of the CMG part of the overall system can be 
defined as: 


Tem RER Waa aD 
2 ey 
The description of the kinetic energy of the pendulum follow [3]. The linear 
velocity vp and angular velocity @p of the Pendulum are described separately 
and then combined. First the angular velocity of the pendulum arm is deter- 
mined as follows: 


0 da da 
@p=R5:|0|+1!0| = | —g3 sin (qa) (13) 
i3 0 ġ3 cos (44) 


The linear velocity vp of the pendulum arm is composed of the translational 
velocity of the pendulum joint 


0 
v = Rb (WN p x [Lı, 0, OJ") = | cos (q4)Liġ3 (14) 
sin (q4)L193 
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and the translational velocity of the pendulum center mass 


— sin (q4)lpå3 
Vom =@pX[0,0, lp] = | -Irga (15) 
0 


which results in 


— sin (q4)lpq3 
Vp = V2 HVp m = |cos(g4)Lıga —Ipga| - (16) 
sin (g4)L143 


The kinetic energy of the pendulum is then given by 


1 
T>(a,9) = 5 (ve (mE )vp + (@p) "Ikor ) , 7) 


where E represents the identity matrix. Since the pendulum has potential 
energy, and the reference height is chosen at q4 = 7 (hanging pendulum), the 
potential energy of the Pendulum Vp is 


Vp(q) =8: mp- lp(1 +c0s (q4)) (18) 


With the kinetic and potential energy of all bodies, the Lagrangian can be 
formulated as 


-Z (4,4) = Tcmc (4,4) + Tr(q,9) — Vr (a). (19) 


Now, the equations of motion can be derived using the Euler-Lagrange formal- 
ism: 


=T (20) 


Due to the total number of bodies, four moving equations are derived describ- 
ing the acceleration of the different bodies. 
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For the formulation of the Lagrangian from (19), the Symbolic Toolbox in 
MATLAB is used, and the Euler-Lagrange equations were derived according to 
(20) using the community function "EulerLagrange". 


2.3 Takagi-Sugeno System 


The TS-Model is constructed via local linearization; therefore, consider the 
nonlinear system 


k= f(x,u 
i= fx) en 
y=h(x) 
Through the first-order Taylor Series Expansion, we obtain the matrices 
Of of oh 
A; =! B; = — ars 22 
dx |, ou. ox|. (22) 


where c describes the linearization point. We then formulate the TS-System, 
where all linear submodels are blended into each other by the membership 
functions h;(z) 

x=) hj(z)(Aix+Bu+a,)) 


i=l (23) 


N; 
y= £ hi(z)Cix+ c; 
j=l 


where z denotes the vector of premise variables that determine which model 
is active at any given time, N, is the number of linearization points [4]. The 
membership functions h;(z) are chosen to be triangular and fulfill the properties 
1 > hj(z) > 0 and haa h;(z) = 1. Each submodel i represents the nonlinear 


system at the linearization point c to 100%, i.e., if hy = 1, only the submodel 
x = Ax + Bju +a; would be active and represent the current dynamics fully 


[5]. The affine terms a; and c; of non-equilibrium linearization points are 
computed as follows: 


ee =) ae ieee A; c = B; c 
ai fx u.) x u (24) 
=h(x,) — Cix, 


Ci 
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It is noted here that, for simplicity reasons, the affine terms are neglected for 
controller synthesis. 


2.4 Parallel Distributed Compensation 


The fuzzy controller formulation is done in a similar manner as the TS-Model 
by utilizing local controller gains for each linearization point c and blending 
these together with the membership functions h;(z) of the model as the premise 
variables z change [6]. 


u=— )_hi(z)Kix (25) 


Augmenting (25) into (23) and neglecting the affine term a; yields the closed- 
loop TS-System of the form: 


t= X È OBOA- BiKj}x (26) 


Which can be written in compact form: 
N, Ny 


i=} 
i=l j 


h;(z)hj(z)Gijx (27) 
1 


2.5 Controller Synthesis via LMIs 
For controller synthesis, the local quadratic Lyapunov functions are applied: 
Vi(x) =x! Pix (28) 


Vi(x) =a! Pix +x! Pix (29) 


where P; is a symmetric, positive definite matrix. To find controller gains, we 
augment (29) with (27) and define P; = x and K; = M;X; l as for each 
local controller j the LMIs are solved independently thus i = j for controller 
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synthesis. The requirements for asymptotic lyapunov stability: 


Vi » VX 
(x) >0, Vx 40 (30) 
Vi(x) =0, x=0 


Vix) <0, Vx £0 81) 


can then be expressed in LMIs which constrain convex sets in the complex 
plane [7]. The so-called 2-Region LMI constraint then results in the formula- 
tion: 


Find a matrix X; = X] > 0 and M; for a desired œ > 0, r > œ, and 0 > 0 under 
the constraint: 


X;A/ +A;X;—M, B} -BM;+20X; < 0, (32) 
—rX; AX; = B;M; 

<0 33 

X;A; -M,B/ —rX; a | 


sin @(X;A} +A;X;—M]} B} —B;Mj;) 
cos0(X;A/ —M] B} -A,X;+B;M;) 


<0 (34) 
cos 6(A;X; — B;M; — X;A; +M} B} ) 
sin @(A;X; — B;M; +X;A] —M/ B;) 


where œ denotes the minimum required decay rate, r defines the radius of a half 
circle toward the complex left-hand open plane with the origin in the center of 
the complex plane, and @ defines the angle between the real axis and a cone 
restriction toward the complex open left-hand plane [8]. 


As the Z-Region constraint might not be ideal for the coupled dynamics of the 
system (all states underlie the same decay constraints a), an optimal controller 
design is pursued as well. The optimal controller LMI approach from [6] is 
used, utilizing the performance function: 


I= | OWylt) + (Rul jar (35) 
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The cost function (35) results in the minimization problem J < x! (0)P;x(0) < 


A with the following LMI constraints: 


min A 
X;,M;,Yo; 
subject to 
X; >= 0, Yo; > 0, 
À Hy 
x (0) sp; 
x(0) X; 
Ü; + (s _ 1)Y3, <0, 
where s > 1 
X;A/ +A;X; 
ik; +A; 2 = xc! -M7 
X BM;—-M;B, 
U; = 
CiX; —w'! 0 
-M; 0 -R-! 


Y3 ; = block-diag (Yo ;,0,0) 


(36) 


(37) 


(38) 


(39) 


The optimization problem above is in a reduced form from [6], as stability is 


only demanded for the local models and no combination of i Æ j as well as 


the relaxed stability condition where s is the maximum number of submodels 


that are active at the same time. The weighting matrices W and R are chosen 


constant and do not deviate for different local constraints. 


For handling the LMIs, the YALMIP interface together with the solver MOSEK 


is used [9],[10]. 
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Figure 3: State comparison between the nonlinear CMG Furuta Pendulum equations evaluated 
using MATLAB’s ode45 solver and measurements of the testbed system on the top, and 
RMSE errors at the bottom with gı = 0 rad/s. 


3 Results 


3.1 Model Validation 


The derived nonlinear equations of motion for the CMG Furuta Pendulum from 
the Euler-Lagrange approach (20) are simulated using the parameters provided 
in Table 1. The simulations are carried out using the MATLAB ode45 solver 
and are compared to measured values obtained from the testbed system. The 
comparison is evaluated through the Root Mean Squared Error (RMSE): 


N 
RMSE = i 7 Lol) 90)? (40) 
k=1 


for time intervals of 1, 1.5, 2, and 2.5 seconds. Two different sets of measure- 
ments are taken while the pendulum is falling from its upright position. For 
the first measurement, the disk is not spinning (¢; = 0 rad/s), and the states 


lg T 
Xmeas,1 = [3 ’ ga] 
system essentially represents a Furuta Pendulum with the center of mass of the 


are measured, as depicted in Figure 3. At this point, the 


cantilever arm at the center of rotation of the frame. 
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Table 1: Systen Parameters of the CMG Furuta Pendulum testbed system 


Variable Value Unit 


Jpxx 0.0027 kgm? 
Jpyy 0.0048 kg m? 
Joz 0.0027 kgm? 
Jax 0.0014 kg m? 
Joyy 0.005 kg m? 
Jaz 0.005 kg m? 
igs 0 kgm? 
JFyy 0 kgm 
Jrz 0.0414 kgm? 
JPxx 0.003 kg m? 
Jpyy 0.003 kg m? 
Jez 0 kgm? 
My 0.8 kgms-! 
Lb 72 kgms-! 
u3 0.7 kgms ! 
Wa 0.135 kgms l 
Li 0.254 m 

lp 0.246 m 

mp 0.216 kg 


The second measurement is taken with the disk spinning at its maximum speed, 
imax = 28.8 rad/s. In this case, the states Xmeas2 = [q42,43,q4] | are measured 
to observe the effect of precession on the gimbal states and vice versa, as shown 


in Figure 4. 


In the first measurement, the pendulum angle q4 shows an RMSE of around 
0.2 up to 2 seconds, whereas the frame angular velocity ġ3 exhibits an RMSE 
of around 0.4. The frame velocity measurements particularly deviate from 
the simulation at the peaks, where the frame is accelerated due to the falling 
pendulum. The second measurement shows that the measured and simulated 
states strongly differ from each other. It is presumed that this discrepancy is 
due to the neglect of the motor actuating the disk, which is attached along the 
y-axis of the gimbal coordinate frame. The motor has a gearbox, increasing its 
weight and potentially offsetting the center of mass of the gimbal. Therefore, 
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Figure 4: State comparison between the nonlinear CMG Furuta Pendulum equations evaluated 
using MATLAB’s ode45 solver and measurements of the testbed system on the top, and 
RMSE errors at the bottom with g; = 28.8 rad/s. 


the assumption that all CMG center masses are located in the center of the 
CMG is not valid for the measured values. 


Table 1 displays the system parameters for the testbed system, which were 
obtained through the CAD program Inventor, using an internal Finite Element 
Method (FEM) to calculate the inertia of the components. 


3.2 Controller Design 


The state feedback control structure is depicted in Figure 5. The controllable 
inputs of the system are u] = Tı and u2 = T2. The torque Tı actuating the 
disk is not controlled and is kept at a constant value to ensure the disk spins 
at its highest angular velocity. Input u2 is controlled via the PDC control law 
with controller parameters synthesized through the LMI formulations given 
in Section: Controller Synthesis. As premise variables, the current gimbal 
ie 


and pendulum angles z = [g2,g4| are used. The disk position and veloc- 


ity are neglected for the state feedback controller, leaving the states Xenti = 


[q2,ġ2,43,ġ3,q4,ġ4]' to be controlled via the PDC. 
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z= [9,44] 


Xen! = [92»9293,93 94:94] | 


Figure 5: Basic Control Structure, P resembles the Plant, z the premise vector, T} and 7 the inputs 
u, and wp into the system respectively and x... the controlled system states 


The linearization points are chosen as q2 € [—1.2, 1.2] rad for the gimbal angle 
and q4 € [—0.5236, 0.5236] rad for the pendulum angle. The disk speed is set 
to gı = 28.8 rad/s for Scenario 1 (Low-Speed Disk) and g; = 45 rad/s for 
Scenario 2 (High-Speed Disk). 


Scenario | yields the following submodels of the TS-System: 


0 1 0 0 0 0 
0 -72 0 1241 0 0 
0 0 0 1 0 0 
A = 41 
Léo 10.115 -089 0 -07 247 0 GM 
0 000 0 1 
0.081 —0.65 0 0 2841 —0.14 
0 1 0 0 0 0 
0 -72 0 1241 0 0 
0 0 0 1 0 0 
A = 42 
22a | 0.115 3089 0 -07 247 0 pta 
0 0o 0 0 0 1 
-0.081 —0.65 0 0 2841 —0.14 
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Table 2: -Region constraints and Optimal Controller weighting for Low-Speed Disk (Scenario 
1) and High-Speed Disk (Scenario 2) controller synthesis. 


Scenario D-Region Optimal Controller 
W = diag(0.1,0.001,0.1, 
0 = 0.5236 rad 0.0001, 1,0.001) 
1 a=2 R=0.3 


x(0) = [0,0,0,0,0.27,0]" 
W = diag(0.1,0.001,0.1, 


0 = 0.5236 rad 0.0001, 1,0.001) 
2 a=2 R=0.3 


x(0) = [0,0,0,0,0.37,0]' 


| 0 0 | 0 0 
0 264.61 0 264.61 
0 0 
By 3 cntrl = 16.6 0 ) Bo 4 cntrl Fi _16.6 (43) 
0 0 0 0 
12.1 0 —12.1 0 


The parameters for controller synthesis in the different scenarios are listed in 
Table 2. The input u2 = T is limited to 2.5 Nm, which is the maximum torque 
of the actuating motor for the gimbal. 


For scenario 1 we obtain the different gain sets for the Z-Region controller 


Kı 4.9 = [—7.44,0.075, —6.01, —8.15, —14.47, —1.99] 


(44) 
K: 3,9 = [7.66,0.076, 6.41,9.8, —14.32, —3.49] 
and the optimal controller: 
Kj 4.0pt = [—0.098, 0.13, —0.004, 0.102, —15.48, —2.87] (45) 


K2 3 opt = [-0.087,0.15,0.051,0.17,— 17.48, —3.32] 
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Table 3: Initial conditions for Low Speed Disk (Scenario 1) and High Speed Disk (Scenario 2) 
simulation studies. 
Scenario D-Region Optimal Controller 


1 xo = [0,28.8,0,0,0,0,0.2,0]' xo = [0,28.8,0,0,0,0,0.27,0]' 
2 xo = [0,45,0,0,0,0,0.2,0]' xo = [0,45,0,0,0,0,0.37,0]" 


3.3 Simulation Studies 


In total four simulation studies are conducted to find the maximum initial angle 
qa of the pendulum the controller is able to return into the unstable equilibrium 
q4 = O rad. 


1. Disk spinning at gı = 28.8 rad/s, Y-Region constraints 
2. Disk spinning at g; = 28.8 rad/s, Optimal controller design 
3. Disk spinning at g; = 45 rad/s, Y-Region constraints 


4. Disk spinning at g; = 45 rad/s, Optimal controller design 


The simulation setups allow for comparison between the two different con- 
troller synthesis procedures. The increase in angular velocity of the Disk from 
gi = 28.8 to ġı = 45 rad/s shows if control performance can be improved by 
increasing the dynamic of the actuator. As the actuating torque moving the 
Frame through precession is defined as 


T3 = ġ2 - ġ1 ` Jpyy cos (q2) (46) 


As the goal of the Simulations is to find the maximum angle q4 the controller 
is able to recover the initial conditions for the different scenarios are listed in 
Table 3. The results show, that the maximum initial angle for the 2-Region 
controller is q4 = 0.2 rad for Scenario 1 and Scenario 2. The optimal controller 
was able to recover the Pendulum from q4 = 0.27 for Scenario 1 and q4 = 0.37 
rad for Scenario 2. 


The simulation results are depicted in Figures 6, 7, and 8. Two main findings 
can be obtained. Firstly the simulations indicate a limited stability region 
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Figure 6: Gimbal, Frame, and Pendulum angles q2, q3 and q4 respectively for Low-Speed Disk 
and High-Speed Disk scenarios as well as a comparison between Y-Region and Optimal 
Controller 


independent of controller synthesis due to various factors. The torque 73 is 
limited by the velocity of the disk gı, and the maximum torque 72 enforced 
by the motor moving the gimbal limits gz as seen in (46). Most notable is 
the cos (q2) term from the precession in (46) if the gimbal is moved towards 
the angle q2 — +7 /2 the effect on the frame through actuating the gimbal 
decreases significantly. 


Secondly, the simulations indicate that the decay constraints imposed by the 2- 
Region have a negative impact on the maximum recoverable initial pendulum 
angle q4. This is likely induced by the coupled dynamics of the system, as 
the body acting on the pendulum is the frame, which is actuated by the gimbal 
movement. 


Increasing the Disk angular velocity to 45 rad/s for Scenario 2 also increases 
the angular momentum of the disk proportionally, therefore increasing T3 by 
actuating the Gimbal with 7. 
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Figure 7: Gimbal, Frame and Pendulum angular velocities g2, g3 and ġ4 respectively for Low- 
Speed Disk and High-Speed Disk scenarios as well as a comparison between Y-Region 
and Optimal controller 


Figure 8 displays the torque t demanded by the controller, which acts on 
the gimbal. It can be observed that at the beginning of the simulation, the 
recovery of the pendulum angle q4 has the most significant impact on the 
demanded torque. The Y-Region controller and the optimal controller exhibit 
different behaviors when the pendulum angle is recovered and stabilized. This 
difference is due to the less conservative decay constraints for the optimal 
controller on the states, excluding the pendulum position, which is also evident 
in Figures 6 and 7. 


4 Discussion 


The results presented in this paper demonstrate the capability of a Takagi- 
Sugeno Parallel Distributed Compensation (PDC) fuzzy controller to stabi- 
lize a highly nonlinear Control Moment Gyroscope (CMG)-actuated Furuta 
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Figure 8: Demanded and saturated input torque un = T) for Low Speed Disk and High Speed Disk 
scenarios as well as comparison between Y-Region and Optimal controller 


Pendulum at an unstable equilibrium point. It is found that the dynamics of 
the torque generated through precession, in combination with the controller 
design, have an impact on the maximum initial pendulum angle the controller 
is able to recover to its unstable equilibrium. In an optimal controller design, 
where the closed-loop dynamics of each state can be weighted independently, 
the controller can exceed a common decay rate constraint through LMIs, es- 
pecially when the actuator dynamics are high, and the system dynamics are 
strongly coupled, as in the testbed system. Shortcomings primarily lie in the 
identification of the system behavior when the disk is spinning. Therefore, an 
adjustment in the hardware and weight distribution of the gimbal might yield 
better results. Furthermore, the current testbed system has cables running off 
the motor that actuates the gimbal, as seen in Figure 1, introducing random 
friction terms due to the current cable positions. An adaptation to slip rings 
might be advantageous, eliminating random friction terms. 


Furthermore, the current linearization points are chosen to include a wide range 
of angles the pendulum and gimbal can have during the recovery of the pen- 
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dulum angle. Performance might be improved by increasing the number of 
linearization points. 


It is noted, that the Y-Region constraints were not further optimized after 
showing acceptable performance therefore the Y-Region controller might show 
some room for improvement. 


The angles the controllers are able to recover are quite small indicating a lim- 
itation of the dynamics of the system under current actuating forces. Therfore 
the current Hardware is subject to improvement to increase overall dynamics 
of the system. 


5 Conclusion 


The control of a testbed system similar to the one investigated in this paper is, 
to the knowledge of the authors, only conducted in [1] where an LPV approach 
is pursued. [1] does focus on the swing up of the system and does not include 
an investigation of the stabilizing controller for the unstable equilibrium. It is 
only stated, that the switching between the swing-up controller and stabilizing 
controller is executed at a Pendulum angle |qx| <0.15 rad and the LPV schedul- 
ing region ranges from q2 € [—60°, 60°] and @; = q € [30 rad/s, 60 rad/s]. 
This paper presents an alternative control concept for the stabilizing controller 
through a Fuzzy TS approach and LMIs which in Scenario 2 achieved a stable 
recovery of the Pendulum from an initial angle of q4 = 0.37 rad under ideal 
conditions and Single Input only actuating the Gimbal through the controller. 


As the system parameters in [1] differ from the one in this work it is to be 
investigated, if the Fuzzy controller is able to show similar performance when 
adapting the system parameters to match the system in [1]. 


It is mentioned here, that for the system parameters [1] refers to [11] where 
the inertia Tensores might not be correctly given in comparison to the system 
sketch. [12] is dated later than [11] where the inertia Tensors are corrected. 
[11] and [12] refer to the manual of the testbed system ECP 750 (Educational 
Control Products) which is to the knowledge of the authors not accessible 
publicly. 
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6 Future Work 


Future work could include an extension to a multi-input system where the 
torque Tz is actuated through the controller as well as conducted in [1]. The 
premise variables could be extended to incuding the current Disk velocity gı. 
Also the Z-Region and optimal controller constraints can be altered for each 
linearization point increasing flexibility for controller design. As the number 
of linearization points is quite small, the controller synthesis could also be 
extended to find a global lyapunov function. 


Another object of future work is the design of a Fuzzy TS swing-up controller, 
where further linearization points as well as an extension of the premise vector 
might yield satisfying results. 


A significant improvement of the testbed system is the adaptation of the Hard- 
ware, real life tests can then be performed to show the performance of the 
controller on the Hardware itself. The controller synthesis therefore can be 
extended to an optimal-robust approach to ensure robustness against model 
uncertainty. 
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1 Introduction 


Sliding mode observers (SMOs) provide a robust tool for state estimation and 
give additional information about disturbances and model uncertainties [1]. 
Thus, they are frequently deployed for fault detection and analysis. However, 
analysis often contains only low-pass filtering without any further identification 
scheme [2]. Yet, characterizing disturbances may be advantageous not only for 
disturbance control to prevent any harm to the plant and maintain its desired 
behavior, but also to ensure a longer life cycle of mechanical components, e.g. 
by actively compensating for disturbances with eigenfrequencies. While our 
previous work [3, 4] focused on the joint estimation of states and model uncer- 
tainties in general, this contribution transfers the concepts to robust estimation. 
In particular, we demonstrate how to efficiently and even automatically receive 
dynamical representations for disturbances by a SMO, while also delivering 
correct state estimates. Ultimately, this insight can be utilized for disturbance 
control and model adaption. 


2 Sliding Mode Observer 


For the purpose of this contribution, a SMO is designed for the control of an 
inverted pendulum on a cart. The set up of the pendulum is displayed in Fig. 
la and its parameters are shown in Tab. 1b. 
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Parameter | Value | SI Unit 


mass m 0,654 | kg 
gravity g 9,81 | m/s? 
length a 0,267 | m 


inertia J 0,0101 | kg/m/s? 
damping d 0,001 | Nms 


(a) Set up at our laboratory (b) Table of parameters 


Figure 1: Pendulum on a cart and its characteristics 


Its dynamics are described by the following: 


9 
amcos(Q)-u+mgasin(@) —d@ 
= J+ma? ’ (1) 
$ 


aa & 6 


u 


with @ denoting the angle, s the position of the cart and @,s the velocities, 
respectively. However, for simplicity we consider a second-order system in 
its nonlinear observability canonical form with dynamics f [5] to describe a 
SMO, since it can be easily adapted towards the pendulum on a cart. Then, the 
corresponding SMO takes the following form 


X = &2 + vi (ey) 
Sy] FR ,2,u) + V2(ey) 


ey, =y- ĵ=x1 -ĝi = 6, 


(2) 


with £ ,ĉ2 denoting the estimated states, f representing the model of the system 
and e, indicating the measurement error. Hence, the error dynamics with 
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eı = xı -Xı and e2 = x2 — £2 are deduced with Eq. (2) by 


éj =eo4+ Vy (ey), (3) 
&=Af+ V2(ey). 
Af is hereby the deviation between the system f and the model f. The pa- 
rameters k; of the injection terms v;(e,) = —k;sign(e,) control the stability, 
effectiveness and convergence rate of the estimation. Especially, k2 needs to 
be chosen such that k2 > |Af| holds [1, 2] but guessing the maximal model 
deviation correctly often remains a challenge. 


3 Data-driven disturbance identification 


Since these injection terms v;(e,) = —k;sign(e,) are available at any time, we 
can utilize them for identifying the model deviation Af. Assuming that the 
SMO is roughly well parameterized with design parameters k; and has reached 
its sliding phase, we can not only expect &, — 0 but also & — 0. Thus, we re- 
ceive Af = —V2(ey). Now instead of low-pass filtering Af [2], which is usually 
the way to track potential disturbances, we seek for a physically interpretable 
representation of the disturbances besides capturing their dynamics. By this, 
we gain more insight into the disturbances and are able to e.g. compensate 
for these actively or analyze their effects regarding the life cycle of affected 
components such as actuators. Moreover, this information can be utilized for 
model adaption. Simply, assume a linear combination of ng suitable, physics- 
based terms stored within a library ¥ € IR” that incorporate one’s hypotheses 
which characteristics the disturbances may exhibit. Therefore, the following 
holds for Af’s approximation by the parameters 0 € R"®: 


eg = Af — 07 Yu). (4) 


For useful insights into Af, the interpretation error eg must tend towards zero, 
ideally for t — œ. Hence, the optimal Ô is found by minimizing 


t t 
me ming | egar = f (—vo(ey) - 07 W(2,u)) ar, (5) 
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whose solution is given by [6] 


=] 


ô = (-f vale u)Tar) Lf vee cures (6) 


By using an efficient, dynamic calculation for ¥ [6], the inversion of the library 
does not need to be computed completely within every time update. To account 
for changing characteristics and time-dependent behavior, the cost function can 
be averaged by a time factor if necessary. 

However, choosing terms yj for is difficult beforehand if no prior knowl- 
edge regarding the disturbances’ characteristics is available. A solution to this 
challenge is to collect information regarding the disturbances, e.g. by a Fourier 
transformation for oscillations. Fig. 2 illustrates the Fourier transformation 
for the example used in Sec. 5, which identifies the three most important 
frequencies. Thus, using the Fourier transformation helps to identify the main 
frequencies that are then automatically included by trigonometric terms within 
as characteristics of Af. 
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Figure 2: Fourier transformation to identify the main frequencies for choosing library terms y; 
automatically 


4 Results and outlook 


To illustrate the effects of our proposed method, we present results from an 
open-loop scenario since it is easier to account for the outcomes without the 
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controller’s influence. However, similar results have been obtained for closed- 
loop behavior using a linear quadratic controller combined with a DMOC 
optimal trajectory [7]. 

Forcing an external disturbance p(t) = 4sin(3at + 2/2) additional to the ex- 
citation u(t) = sin(at + 2/2) on the test bench at our laboratory, that affects 
the cart’s position, we check if the proposed SMO automatically identifies the 
additional dynamics. Therefore, we compare two SMOs with libraries that are 
constructed differently. First, a library is set up by prior knowledge that con- 
tains the dynamics of p(t). Thereafter, a library is constructed by the Fourier 
transform whose identified frequencies are utilized within it. Both libraries 
finally exhibit identical terms y; to compare their performance, namely 


te 


P(£,u) = (sin(@), G, sign(@), sin(at+72/2),sin(3at+2/2),sin(Sat+7/2))’. 

(7) 
As Fig. 2 depicts the frequency of p(t), namely @ = 37, is identified correctly 
by the Fourier transformation. It also recognizes the frequency of the excitation 
u(t) at @ =. Using this information, Fig. 3 then shows excerpts regarding 
the convergence of the parameters Ô and the model deviation expressed by 
V2(ey). If the library is set up by prior knowledge, the orange dashed signal 
in Fig. 3a shows that the deviation reduces much faster compared to when 
relevant terms for ¥ first need to be determined by a Fourier transformation 
which is illustrated by the blue dashed signal. However, in cases when we 
do not have any information regarding p(t), this enables a fully automated 
identification of disturbances and features only slightly more convergence time 
due to the necessary collection of data that lasts in this case around 26s. Note 
that it ultimately arrives at a similar level of error compared to when prior 
knowledge is used directly. 


Considering the course of the parameters Ô, both strategies show strong con- 
vergence rates, only varying in speed due to the data collection and analysis 
of the Fourier transformation. Yet, both converge to the same value and de- 
liver consistent results, e.g. identifying the term y(t) = sin(3at + 7/2) as 
present within the disturbances and neglecting the term y(t) = sin(5at+ 7/2) 
by convergence towards zero. However, it can be noticed that the identified 
parameter bsin(antn /2), Which both strategies converge to, does not coincide 
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(a) Convergence of v2(r): It converges more slowly if no prior knowledge is used due to the 
necessary data collection for the Fourier transformation (blue) compared to when prior 
knowledge is applied (red). 
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(b) Convergence of selected 6: Parts of p(t) are identified correctly, although convergence rates vary 
due to how the library terms are determined. 


Figure 3: Excerpts from the identification of Af by prior and automatically chosen X 


with the amplitude of p (t). This results from the effect that p (t) acts directly on 
the control input u(t). As Eq. (1) describes it holds amcos(P)(J +ma?)~! - u. 
Due to the angle’s oscillation around —7 as it can be seen later in Fig. 5, which 
results in cos(P) = —1, the factor tends to amcos(@) (J +ma*)~! 23. Thus, 
both SMOs identify the overall amplitude of p(t) with 12, assuming the factor 
rather belongs to the disturbance than to the control input. 


Moreover, in addition to the decrease of v2(t), we verify if the SMOs identify 
the disturbance p(t) correctly. Hence, Fig. 4 shows an excerpt of a comparison 
between the disturbance p(t) and its approximation by the linear combination 
6T WR, u) with the Fourier transformation once the parameters converged. It 
reveals that the approximation captures the disturbance well although some 
minor deviations can be recognized. Note that p(t) is displayed with the factor 
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Figure 4: Excerpt from comparison of disturbance p(t) and its approximation by 67 Y (£, u) 


acting on the control input, since the SMO assumes it acting on the disturbance. 
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Figure 5: State estimation when the pendulum is excited by u(t) and the observer gets additional 
disturbance p(t) 


Since the injection term v2(t) decreases significantly over time and is already 
very low in the beginning of the estimation, the quality of the state estimation is 
expected to be high throughout. Fig. 5 confirms this impression by presenting 
the trajectories of the pendulum over time. Due to its good parameterization 
with ki the sliding mode observer captures the pendulum’s dynamical behavior 
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right from the beginning very well without any major estimation errors even 
though the disturbance is present and not yet fully identified. 


In conclusion, this contribution showed the concept of joint estimation de- 
ployed within a sliding mode observer. It highlighted the advantages that 
result from disturbance identification and additionally presented the option to 
automatically receive candidate functions for the library by the Fourier trans- 
formation. Further, it convinced with a high quality of state estimation, while 
gaining more insight into present disturbances. Future research allows the 
usage of those strategies for intelligent fault management and provides a tool 
for online model adaption. 
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Abstract 


In an era where deep learning models are increasingly deployed in safety- 
critical domains, ensuring their reliability is paramount. The emergence of 
adversarial examples, which can lead to severe model misbehavior, under- 
scores this need for robustness. Adversarial training, a technique aimed at 
fortifying models against such threats, is of particular interest. This paper 
presents an approach tailored to adversarial training on tabular data within 
industrial environments. 


The approach encompasses various components, including data preprocess- 
ing, techniques for stabilizing the training process, and an exploration of di- 
verse adversarial training variants, such as Fast Gradient Sign Method (FGSM), 
Jacobian-based Saliency Map Attack (JSMA), DeepFool, Carlini & Wagner 
(C&W), and Projected Gradient Descent (PGD). Additionally, the paper delves 
into an extensive review and comparison of methods for generating adversarial 
examples, highlighting their impact on tabular data in adversarial settings. 


Furthermore, the paper identifies open research questions and hints at future 
developments, particularly in the realm of semantic adversarials. This work 
contributes to the ongoing effort to enhance the robustness of deep learning 
models, with a focus on their deployment in safety-critical industrial con- 
texts. 
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1 Introduction 


In recent years, artificial intelligence (AI) has witnessed tremendous advance- 
ments, revolutionizing various domains and becoming an integral part of our 
daily lives. From computer vision systems [1, 2] to natural language processing 
[19, 4] and object detection [5] for autonomous vehicles, deep learning models 
have showcased remarkable capabilities, surpassing human performance in 
many complex tasks. In particular, AI experienced extreme media interest 
due to the capabilities of ChatGPT [4]. However, as AI systems become 
increasingly integrated into critical applications, ensuring their reliability and 
robustness becomes imperative. 


One of the key challenges in the deployment of deep learning models is their 
vulnerability to adversarial examples (AEs). AEs are carefully crafted pertur- 
bations applied to input data, often imperceptible to humans, that can cause 
deep learning models to misbehave or produce incorrect predictions [6]. The 
existence of AEs has raised significant concerns about the reliability and secu- 
rity of AI systems, particularly in safety-critical domains such as healthcare, 
autonomous driving, and industrial automation. 


Nowadays, industrial production plants are intelligent technical systems. These 
cyber-physical production systems can be severely affected by AEs, causing 
major financial or personnel damage. An attacker can either stop systems with- 
out an anomaly being present or allow them to continue operating even though 
a fault has occurred [7, 8]. Notably, in the industrial context, data exhibits high 
heterogeneity, diverging significantly from the limited value ranges typically 
encountered in image data, the origin of AEs. Additionally, industrial data can 
often be unstructured and accompanied by sparse labels. To effectively employ 
common AE generation algorithms, preprocessing of industrial data becomes 
a necessary step. 


This paper delves into the practical application of AEs within the industrial 
landscape. Specifically, this paper encompasses the following key elements: 


e an exploration of prevalent AE generation algorithms, 


e practical insights into adversarial training techniques, 
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e preprocessing methodologies tailored for tabular data, 


e a comparative analysis of diverse adversarial attacks, evaluating their 
suitability for adversarial training with tabular data drawn from the in- 
dustrial context, 


e identification of ongoing challenges and a prospective outlook on future 
research avenues. 


2 Related Work and Preliminaries 


This section provides background information and relevant methods for gen- 
erating adversarial examples (AEs) and countermeasures to enhance robust- 
ness. 


2.1 Adversarial Examples 


The concept of AEs was initially introduced by Szegedy et al. [6] and Biggio 
and Roli [9]. In general they can be defined as: 

Let x € R be an input with true label yo and y, is a (target) label different from 
yo. An AE x’ results from a mapping «/ : R — R? such that the modified input 
x = æ (x) is misclassified as y, without changing its true class. 


However, mapping ./(-) is often limited to a linear operation [10], so that an 
additive perturbation 6 is introduced 


x =x+6. 


To avoid changing the original class membership, 6 must be small w. r. t. a 
distance metric. On image data, ö is commonly minimized in the literature 
[10] w. r. t. the Lp norm 


1 
a 18?) 
Ix’ =l» = llôll = | $ lêl 
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to create AEs that are visually indistinguishable to human observers. In partic- 
ular, the Lo, L2 and Lo norm are employed [11]. The Zo norm represents the 
number of changed features or pixel, the Euclidean distance is measured with 
the L2 norm and the Lœ norm indicates the maximum change of a feature or 
pixel. 


2.2 Adversarial Attacks 


To generate AEs, a variety of approaches have been proposed, again with 
various modifications [10]. In the following, the most influential basic methods 
are presented, which will be compared later. Only white-box methods were 
considered, i. e. those that have complete knowledge of all parameters, as they 
allow for the strongest attacks [10]. 


Fast Gradient Sign Method 


Szegedy et al. describe the generation of AEs as a constrained optimization 
problem [6]. They leverage the box-constrained limited-memory Broyden- 
Fletcher-Goldfarb-Shanno (L-BFGS) algorithm to obtain solutions. However, 
to reduce the computational cost, Goodfellow et al. introduce the Fast Gradient 
Sign Method (FGSM) [12]. Here, gradients V, are calculated once for all input 
features. Each input feature is then modified in gradient ascent direction by a 
fixed step size € to maximize the loss function 2 


ô = €-sign(VxL(x,y0)). (1) 


Since the stepsize € is equal for all input features and they are all modified 
at once, the FGSM is optimized for the L norm. Furthermore, the FGSM is 
fast to compute but not an optimal solution. Kurakin et al. [13, 14] provide an 
iterative version of this attack, which leads to more sophisticated AEs. 
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Jacobian-based Saliency Map Attack 


Papernot et al. introduced the Jacobian-based Saliency Map Attack (JSMA) 
[16]. They compute the Jacobian matrix for a specific target class w. r. t. its 
input features. Based on these partial derivatives a saliency map is constructed 
indicating the influence of each input feature. Subsequently, the most influen- 
tial input is modified accordingly and checked whether an AE is present. This 
process is repeated until a predefined number of features has been altered or an 
AE has been found. Due to the successive nature of feature changes, the JSMA 
is optimized for the Lo norm. 


DeepFool 


The basic idea of the DeepFool algorithm [15] is to view the model as an affine 
transformation, i. e. the authors linearize the models decision boundary around 
an input x. In binary classification the decision boundary becomes a hyperplane 
and in the multinomial case the decision boundaries around the input x are 
approximated with a polyhedron formed by each of the decision hyperplanes. 
They project the input orthogonal, i. e. with minimum distance, to the nearest 
hyperplane and push it slightly beyond it to craft an AE. Since the linearization 
is an approximation they just take a step in the direction of this projection and 
iterate this process until an AE is reached. In the original version the algorithm 
is optimized for the L} norm. 


Carlini & Wagner 


Carlini and Wagner [11] offer a variety of attacks with Lo, L2, and Læ distance 
metrics. However, they claim their L2 attack (C&W) to be the strongest one 
and in fact, the Lọ version leverages the Ly attack. They iteratively optimize an 
objective function consisting of a misclassification term and a distance mea- 
sure of the perturbation. Furthermore, they exploit a scaled and shifted tanh 
function with a variable exchange 


oe 5 (tanh (w) dh Aiea 2) 
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to let the perturbation map natively into the interval [0,1]. By eliminating the 
necessity of clip functions in this way, they are able to employ momentum- 
based optimizers such as Adam [17]. The C&W attack is one of the strongest 
attacks in terms of finding minimal perturbation and fooling the machine learn- 
ing model [11]. Additionally, they overcame numerous defensive strategies 
[19], such as Defense Distillation [18], that existed at the time of release. 


Projected Gradient Descent 


As Gradient Descent is a standard way to solve an unconstrained optimiza- 
tion problem, Projected Gradient Descent (PGD) in general provides a way 
to solve constrained optimization problems. The PGD attack [20] leverages 
this approach to generate AEs. One starts from a random perturbation in an 
L, ball around an input sample, takes a step in the gradient direction of the 
loss function w. r. t. its input data and, if necessary, projects the result back 
into the Lp ball. This procedure is repeated until convergence or exceeding 
the maximum number of iterations. Therefore, Madry et al. [20] reference the 
iterative FGSM as an L~ bounded PGD attack, where the projection is realized 
by the clipping function. The authors claim that the PGD method is probably 
the strongest first-order attack. They argue that AEs generated with it are more 
suitable for adversarial training, since models are also robust against weaker 
methods after training with these AEs. 


2.3 Adversarial Defensives 


The sequence of developments in countermeasures for adversarial examples is 
similar to the history of cryptography. After methods for defense are proposed, 
there are new attack strategies, which in turn overcome them [19]. Defensive 
approaches that do not require the secrecy of specific aspects, such as gradients, 
are therefore to be preferred here as well [21]. 


Adversarial training is a primary strategy for enhancing the adversarial robust- 
ness of neural networks. By introducing AEs during training [6, 12], models 
can be designed to be more robust to small perturbations. Madry et al. consider 
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adversarial training as a saddle point or min-max problem [20]. On the one 
hand, the goal is to generate AEs that maximize the loss function and, on 
the other hand, to find model parameters that minimize this loss. Moreover, 
Tsipras et al. demonstrate that adversarial training can lead to more robust 
features, which, however, are obtained at the expense of accuracy [22]. A 
more detailed overview of adversarial training can be found in [23]. 


3 Approach 


In this section, an approach is developed that facilitates cross-comparison of 
AE generation methods. To ensure comparability among the presented meth- 
ods, appropriate metrics must be selected. However, there is no uniform defi- 
nition of quantifiable adversarial robustness in the literature. Additionally, ad- 
versarial attacks are optimized w. r. t. different L, norms, further complicating 
the assessment of AE quality. To address these challenges, we first empirically 
test whether adversarial training enhances model robustness against attacks 
using the same method as in training. To achieve this, we employ both the 
accuracy on the original data and the accuracy on the AEs as metrics. Subse- 
quently, models trained using one method are evaluated against the remaining 
attacks. 


Another critical consideration is the nature of the data. Humans have less 
intuition for tabular, numerical data compared to speech or images [24]. In the 
image domain, the L, norm serves as an approximation for human perception. 
Visual inspection helps assess whether visible artifacts are present in the AEs. 
This allows to establish a budget for the adversarial attacks such that these 
artifacts are minimized while ensuring that the original class membership of 
the sample is maintained. However, this is not feasible for tabular data, so 
alternative constraints on the adversarial attacks are required. The specific 
limitation of the attack methods is detailed in the next section. 


Moreover, various approaches exist for conducting adversarial training. In 
[20], the iterative training is exclusively performed on the AEs to reduce com- 
putational costs, arguing that AEs already offer greater diversity than the orig- 
inal data points. Conversely, Specht et al. compute AEs only once and not 
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iteratively within the training, augmenting their original training dataset once 
with an equivalent amount of AEs [7, 8]. However, our initial tests failed to 
reproduce sufficient robustness when training solely on once-generated AEs. 
Given the trade-off between adversarial robustness and accuracy [22], we adopt 
a mixed approach. We include the original data in training to prioritize ac- 
curacy, but AEs are recalculated in each minibatch with the current model 
parameters. For each input, an AE is computed without applying a weighting 
parameter, ensuring that AEs and original data have equal influence on the 
loss. 


Additionally, we introduce a one-epoch warm-up phase to stabilize the training 
process. During this phase, only the original data is utilized. Starting from 
epoch two, a mixture of AEs and original data is incorporated. The warm- 
up phase is essential as AEs, considered as worst-case inputs, are typically 
more challenging to learn than the distribution of the original data, which 
can potentially interfere with finding the appropriate parameters at the begin- 
ning. Eliminating the warm-up phase in later implementations resulted in some 
classes not receiving any predictions at all. 


Furthermore, the heterogeneous nature of industry data requires adjustments to 
restrict the range of feature values. This prevents the emergence of unrealistic 
values and eliminates the need to adapt algorithms, as they naturally operate 
in the constrained range x + 6 € [0,1]", originating from the image processing 
domain. Achieving this is straightforward through a min-max scaler. However, 
when scaling, it is important to consider the variance within the dataset. Special 
attention must be paid to extreme outliers that would distribute the majority of 
data into a significantly smaller interval. 
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4 Evaluation 


4.1 Experimental Setup 
Dataset 


The experimantal results are obtained on the Sensorless Drive Diagnoses (SDD) 
dataset [25]. This dataset is derived from two-phase currents measured in 
a 425W permanent magnet synchronous motor, which is part of a modular 
demonstrator, as detailed in [26, 27]. The demonstrator itself is comprised of 
several components, including the test motor, measuring shaft, bearing module, 
flywheel, and load motor. To simulate various fault conditions, synthetic hard- 
ware corruptions can be introduced. The raw data from the demonstrator were 
preprocessed as described in [28] when creating the SDD dataset. Thereby, 
empirical mode decomposition was applied to determine three intrinsic mode 
functions and their residuals per phase. Subsequently, the mean, standard 
deviation, skewness, and kurtosis were calculated for each, resulting in a total 
set of 48 features. 


The SDD dataset encompasses 58,509 samples and includes 11 distinct classes, 
maintaining a balanced distribution. In this context, class 1 signifies the fault- 
free state of the engine, while the remaining ten classes represent various fault 
cases stemming from issues such as shaft misalignment, axis inclination, or 
bearing failure. A summary of the classes and their respective fault cases is 
provided in Table 1. To facilitate model evaluation, an 80/20 split between the 
training and test sets was employed, resulting in 46,806 samples in the training 
set and 11,703 samples in the test set. 


The SDD dataset is particularly suitable as it offers multiple defect classes 
with diverse characteristics in the area of industrial data, which facilitates AE 
generation. At the same time, it is not too high dimensional, which keeps the 
computation times within reasonable limits. 
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Table 1: Error indicators of the individual classes of the SDD dataset. Class 1 is error-free, and 
the remaining ten are error cases. Equal classes, such as class 4 and 5, are not identical; 
they differ in the level of error, for example, the angle of the axis inclination. 


Clas 1 2 3 4 5 6 7 8 9 10 11 

Bearing Failure 0 0 O 0 0 1 1 1 «1 1 1 
Axis Inclination 0 O 1 1 1 O 1 1 O 1 1 
Shaft Misalignment O 1 0 1 1 1 1 1 0 1 


Model Implementation 


A deep neural network (DNN) with four hidden layers is deployed for evalu- 
ation. The input layer comprises 48 neurons, followed by hidden layers with 
590, 1180, 2360, and 590 neurons, respectively, and an output layer with 11 
neurons. The architecture is based on [7]. Although smaller DNNs can classify 
the SDD dataset [29], the selection should prevent a bottleneck of capacity as 
discussed in [20] and exclude this influence. After each hidden layer, Batch 
Normalization (BatchNorm) [31] is applied followed by the Rectified Linear 
Unit (ReLU) activation function. Dropout [30] with a dropout rate of 20% is 
utilized after each ReLU layer to prevent overfitting. The output layer employs 
a linear layer with Softmax for classification. Cross-entropy loss is used with 
the Adam optimizer [17], configured with parameters ßı = 0.9, Bo = 0.999, 
g = 1078, and a learning rate of 1074. The implementation is carried out using 
the PyTorch framework [32]. A min-max scaler is used for preprocessing to 
scale feature values to the range [0, 1]. 


The generation of AEs is performed using the adversarial-robustness-toolbox 
[33] and advertorch [34]. For the L.. attacks PGD and FGSM, the maximum 
perturbation is controlled by the explicit parameter €, which is determined as 
described in the following section. The control parameter I’ for JSMA, regu- 
lating the proportion of features that can be altered, is set at 14.5% following 
[16]. As DeepFool and C&W do not have explicit attack budget parameters, 
they are limited to 10 iterations, equivalent to the number of iterations in the 
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Table 2: The perturbation budget influences adversarial robustness. A perturbation per feature of 
1% of its range leads to adversarial robustness close to the accuracy of clean data. An 
increase in the attack budget significantly reduces adversarial robustness, suggesting a 
potential change in the true class. 


Attack Perturbation in % Clean Accuracy Adversarial Accuracy 


PGD 1 0.99 0.92 
PGD 2 0.96 0.74 
PGD 3 0.93 0.60 
PGD 4 0.99 0.22 


PGD algorithm with the chosen e. Further details, code, and parameter settings 
can be accessed here!, allowing for result reproduction and further research. 


4.2 Results 
Attack Budget for L~ Norm Attacks 


To establish an attack budget for L.. norm attacks, specifically FGSM and PGD, 
we selected the stronger of the two variants, i.e., PGD, and tested it with various 
parameter values. Table 2 presents the results of these tests. All the attacks 
listed in Table 2 were capable of reducing the accuracy of models without 
adversarial training by at least 60%. It was observed how long adversarial 
training remained effective. A significant drop in adversarial robustness can 
be interpreted as an indication that the true class membership has changed, 
rendering the model incapable of learning the data distribution. Based on the 
results in Table 2, a maximum perturbation of 1% of the feature range was set 
as the attack budget for the Lœ norm attacks. 


1 https://ds-juist.init.th-owl.de/j.knaup/ciworkshop 
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Table 3: Accuracy results of the cross-comparison of different AE generation methods. Clean 
indicates the usage of only original training and test data, respectively. The remaining 
training methods utilize a mix of the data in the training phase, and the remaining attack 
methods are evaluated on the manipulated data exclusively. 


Adversarial Adversarial Attack Method 
Training Method FGSM JSMA DeepFool C&W PGD Clean 

FGSM 0.92 0.21 0.08 0.74 0.91 0.99 
JSMA 0.40 0.33 0.26 0.27 0.39 0.43 
DeepFool 0.65 0.12 0.24 0.20 0.57 0.99 
C&W 0.74 0.09 0.09 0.12 0.67 0.97 

PGD 0.93 0.21 0.09 0.74 0.92 0.99 
Clean 0.41 0.07 0.01 0.35 0.32 0.99 


Cross-comparison of AE Generating Methods 


Table 3 shows that training with JSMA generated AEs, significantly affects the 
accuracy on the original data. PGD and FGSM achieve almost identical values 
and are robust to themselves and each other. JSMA and DeepFool reduce 
the accuracies of the other models the most and C&W achieves an increased 
robustness against FGSM and PGD. A detailed discussion is provided in the 
next section. 


5 Discussion 


The results presented in Table 3 align with findings in the literature, where 
FGSM and PGD are commonly used for adversarial training on image data 
[23]. The fact that PGD differs only slightly from FGSM may be attributed to 
factors such as the number of iterations or the limited attack budget. JSMA, 
originally designed for gray-scale images like the MNIST dataset [35], poses 
certain challenges when applied to tabular data. By searching individual pixels 
and increasing or decreasing their values depending on the sign of the adjust- 
ment parameter, JSMA sets individual features here to 0 or 1, respectively. This 
leads to unrealistic inputs, which on the one hand are difficult to classify for 
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other models, but on the other hand it is not reasonable to enrich the training 
set with them. DeepFool is more suitable in this respect, since the respective 
decision boundary is only slightly exceeded. The C&W attack, known for 
its high success rate in finding minimal AEs [11], may benefit from further 
hyperparameter tuning but at the cost of increased computation time. 


However, this study’s approach has yielded the expected results. The pre- 
processing enabled the application of various algorithms and the employment 
of the original data and the manipulated data prioritized the clean as well 
as adversarial accuracy. The warm-up phase added stability to the training 
process. A similar approach to this is curriculum-based learning [36], where 
attack strength adapts and increases as the training progresses. 


Nevertheless, challenges persist in distinguishing between adversarial exam- 
ples and points at which the ground truth has fundamentally changed. While 
approaches like [37] improve PGD by considering the proximity of each input 
to the decision boundary when applying perturbations, they still do not identify 
the true tipping point. Additionally, selecting an appropriate distance measure 
for this assessment remains an open question. Even though Læ norm attacks 
show promise, perturbations with the same L, norm can have vastly differ- 
ent effects. Comparing methods that employ different distance metrics poses 
particular challenges. Furthermore, adversarial training defined as min-max 
problem inherently lacks robustness guarantees due to the non-convex nature 
of deep neural networks, which makes it intractable to find a global optimum 
[23]. 


Moreover, the adversarial mapping ./(-) in this paper has been limited to 
additive perturbations 6. Future research directions may involve exploring the 
use of generative adversarial networks (GANs) [38, 39] to create adversarial 
examples. This approach could generate AEs with semantic information, lead- 
ing to more natural and meaningful adversarial examples, commonly referred 
to as semantic adversarials in the literature [41, 40]. 
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6 Conclusion and Outlook 


In this paper, we presented an approach that extends the application of adver- 
sarial attacks to tabular data for adversarial training. We began by providing 
an overview of various adversarial example generation methods, followed by 
the introduction of a straightforward preprocessing technique and training sta- 
bilization mechanisms. Subsequently, we conducted a comprehensive cross- 
comparison of popular attack methods, including FGSM, JSMA, DeepFool, 
C&W, and PGD, on an industrial dataset. 


The results of our study validate existing findings in the literature, demonstrat- 
ing the effectiveness of FGSM and PGD for adversarial training. However, our 
investigation also highlights the unique challenges posed by tabular data when 
employing methods like JSMA, which generate unrealistic inputs. The quest 
for a suitable distance metric remains a pivotal aspect of future research, as it 
not only determines the presence of adversarial examples but also serves as the 
foundation for method comparisons. 


Looking ahead, the exploration of non-additive perturbations presents a promis- 
ing avenue for the development of new adversarial example generation meth- 
ods. Incorporating semantic contextual information into the generation pro- 
cess may yield more natural and meaningful adversarial examples, albeit with 
potentially higher L, norm values. This shift toward semantically enriched 
adversarial examples could lead to advancements in the robustness of machine 
learning models, particularly in applications involving tabular data. 
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1 Introduction 


Cameras have become increasingly ubiquitous in our daily lives, whether they 
are positioned in public spaces, within our households, or conveniently nestled 
in our pockets via smartphones. They serve diverse real-world applications, in- 
cluding video surveillance of human activities, observing wildlife, facilitating 
home care, enabling optical motion capture, and enhancing multimedia experi- 
ences. These applications typically entail a sequence of tasks, beginning with 
the detection of moving objects, followed by tracking and recognition. Over 
the past three decades, the field of computer vision has dedicated substantial 
research efforts to the task of detecting moving objects, resulting in a wealth 
of publications (cf. [8] for a review). The number of techniques dedicated 
to addressing scenarios involving moving cameras is steadily increasing, and 
this subject matter has become a significant focus for in-depth investigation, as 
evident from recent comprehensive reviews [14, 15, 7]. 


Focussing on the visual control of moving objects, several approaches have 
been proposed using PTZ cameras, e.g. for surveillance [2], autonomous off- 
road navigation and mobile robots [4], and the filming industry [6]. However, 
available professional systems are still very limited in functionality and flexi- 
bility. 


To this end, we develop an intelligent camera tracking system suitable for the- 
ater, dancing and performances. The system consists of a remote PTZ camera, 
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a state-of-the art real-time object detection and tracking algorithm (Yolov8), 
and an user interface to direct and adjust tracking. In this paper, we sketch our 
efforts in developing the system including hardware setup (Sect. 2), application 
requirements (Sect. 3), and detection and tracking algorithms (Sect. 4). We 
conclude with an evaluation (Sect. 5) and discussion (Sect. 6). 


2 Technical Configuration 


The smart camera system consists of a Panasonic AW-UE 4K camera, which 
allows for a pan movement denoted by x with range - 175° < x < +175°, 
a tilt movement denoted by y with range —30° < x < +90° and an optical 
zoom termed z with 1 < z < 24. For each of the three movement coordinates, 
the camera allows for adjusting the movements speed in a range from —50 < 
{x,y,z}, < 50. The current position and the movement commands are send 
and received from the camera via an http API interface. The visual data in 
transferred to a dedicated GPU server via an SDI cable, and the cameras are 
connected to the server via LAN. 


The camera is mounted on a tripod and the absolute camera position remains 
fixed for the play. 


The GPU Server (Windows 10) is equipped with a Blackmagic Capture Card 
to access the video signal in real time (approx. 40 frames/sec). The server 
features a NVIDIA RTX Titan (24 GB memory) graphic card. 


3 Requirements 


In this section, we briefly outline the specific requirements for the camera 
system. The requirements can be structured into three main building blocks 
of the overall system. 
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3.1 Camera Control 


We aim to track objects throughout the stage and over the course of a play or 
performance. The desired position of an object in the frame is the setpoint. 


e We demand setpoint control of pan (x), tilt (y) and zoom (z), the re- 
spective setpoints are denoted by xsp, ysp, and zsp. The pan and tilt 
setpoints denote the coordinates where an image object is to be situated. 
For zoom control, the objects size is estimated from the current frame 
(see perception) and compared with zsp. 


e Setpoint control is extended by a dead zone, i.e. if the object moves 
though remains within the specified dead zone, no feedback is applied. 


« To ensure homogeneous control performance for close and distant ob- 
jects which appear on the frame at same size, the distance of the object 
has to be taken into account. 


3.2 Perception 


Advanced perception capabilities are required for reliable tracking. A basic 
understanding of the scene and its actors as well as background, possible pro- 
jection, and backstage people is demanded. 


e To detect objects on the current frame, bounding boxes and masks are 
considered. A high frame rate and low total latency is desired. 


e Detection classes are head, face, body. Detected objects are represented 
by the respective bounding boxes. These detection classes allow for 
different capture settings such as close-up, knee-up, whole body, and 
dialog capture. 


e All visible persons on the stage shall be detected. Multiple actors may 
be present on stage. Detected objects have to be tracked and assigned 
with a tracking ID number. 


e The objects distance is to be estimated. 
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e Track losses and switches, e.g. due to occlusion, have to be considered. 
Fallback options have to be elaborated. 


e Basic association is required for the detected objects and tracks. Objects 
from the three detection classes shall be associated to persons. 


« To re-identify a person across multiple cameras, face identification is 
employed. 


3.3 Interface 
A flexible and intuitive Human Machine Interface is required. In short, the 
interface allows for: 

e visualizing the detections, tracks, and associations 

e adjusting setpoints and the dead zone online 

e changing the tracked object, optionally with smooth transition 

e modifying controllers speed 

e choosing fallback options 

e saving, loading, and visualizing data and configuration 


e accessing and controlling all (three) cameras 


4 Methods 


To obtain a robust, flexible, and fast smart camera tracking system incorporat- 
ing the requirements outlined in Section 3, we propose the workflow depicted 
in Fig. 1. 


Here, the iterative loop starts with a new frame delivered by the PTZ camera. 
The frame is processed in the perception module subsequently, where the target 
object is detected, tracked, and identified. The objects information is then feed 
into the controller module. In combination with the user interface settings, the 
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Figure 1: Key modules and the basic workflow for the proposed smart camera system. 


controller computes the outputs for pan, tilt, and zoom velocities so as to close 
the feedback loop. 


In the following, we describe the methods and approaches for the key modules 
seperately. 


4.1 Perception Module 


The perception module receives an image, detects all relevant objects in this 
image, tracks the objects from frame to frame, associates tracks with persons, 
and identifies them if possible. 


4.1.1 Object Detection 


To detect objects, namely persons, on the image, a custom model for YOLOv8 
[1] has been developed. The model yields bounding boxes for three classes: 
head, face, and body. For training the model, several datasets such as Holly- 
wood Heads [9], CrowdHuman [10], Facenet [11], and COCO [12] have been 
combined. Since the desired classes were not covered by all datasets, cross- 
inference was used to label the missing classes in each dataset. 
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(a) Training metrics for the custom detection model. (b) Confusion matrix of the custom model. 


Figure 2: Metrics of the custom Yolov8 Model. 


The obtained dataset consisted of approx 120.000 labelled images. Using the 
open source tool FiftyOne [13], we excluded clones, crowds, and removed the 
most similar images. Furthermore, to decrease false positive detection rates, 
10 % background images (no objects no labels) were added to the dataset. The 
training dataset finally consists of approx. 32.000 labelled images. 


Training results and the confusion matrix are depicted in Fig. 2. Accuracy is 
very good for head (94 %), face (87 %), and body (92 %). As we could verify 
in various rehearsals, the model is robust as it copes very well with different 
light settings. 


4.1.2 Object Tracking 


For tracking the objects detected by our custom Yolov8 model, we used the 
Kalman-based tracking algorithm Bytetrack [16]. In comparison with other 
SOTA multi-object-trackers (see references in [16]), Bytetrack provided the 
best tradeoff between inference speed and accuracy for our purpose of tracking 
actors on stage. 


4.1.3 Association and Recognition 


To associate head, face, and body objects to persons, we utilize intersection 
over union (loU). Thus, we compare the extend of overlap of all detected 
bounding boxes. The association works pairwise: If the smaller bounding box 
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is covered by at least 90 %, we associate the two bounding boxes to the same 
person. 


4.1.4 Fallbacks 


On runtime, it happens that a track is lost, e.g. due to occlusion, turn arounds, 
or dancing moves where the head and face are temporarily not visible. Elabo- 
rating usefull fallback options thus is essential to ensure reliable tracking. 


Basic idea of the fallback strategy is to use the associations obtained. If e.g. 
the head track is lost, the fallback consists of automatically switching the track 
to the persons face if available, otherwise to the persons body. If at some stage 
later a head is reassociated with currently tracked body or face, it atomatically 
switches back to head tracking. Note that it may be required to smoothly transit 
to the new setpoints. 


4.2 Camera Control Module 


The control module receives the location and size of the current object of 
interest in the image. Given the current setpoints, the controller computes the 
errors and a corrective feedback. To this end, we use separate PI controllers for 
pan, tilt, and zoom control. 


4.2.1 Output linearization 


PID 


Controller output («577 


Fig. 3a. 


) and cam output s4" 


eur are non-linearly correlated, see 


The measured response Fig. 3a is corrected using a sigmoid function: 


1 
feor(x) = 100( 7—0 — 05) 


(see Fig. 3b). The resulting (linearized) IO response is depicted in Fig. 3c. 
The response is suppressed in the range 0 < |Xout| < 9, linear in the range 
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(a) Measure responses. (b) IO correction. (c) Corrected IO response. 


Figure 3: IO Pan resonse curve and proposed correction. 


10 < |Xow| < 35, and saturated for |xo.+| > 36. The correction is applied to pan 
and tilt. The zoom IO response is already almost linear, so no correction is 
required (data not shown). 


4.2.2 Distance estimation 


We trained a small dense multilayer perceptron (MLP) from data acquired from 
the PTZ camera and laser rangefinder. The networks inputs are head width, 
head height, and the zoom level, the output is the distance estimate d [m]. 
The model provides distance predictions in real time, and validation showed 
acceptable performance. An analysis of it is out of scope of this paper. 


4.2.3 Adaptive PID controllers 


Pan, tilt, and zoom are controlled using PI controllers. The parameters have 
been tuned manually. The interface allows to adjust the gains online. 


To compensate for a trigonometric non-linearity in pan and tilt, the camera- 
object distance is estimated using the trained MLP as described above. The 
pan and tilt controller gains are hence automatically scheduled to as: 
E3 
Kp (d) = K, ` d’ 

where K5 is the presetted gain tuned for a object distance of 3 m. In the real- 
time setting, we filter the distance estimate of the head object averaging the last 
three distance estimates. 
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4.3 Interface module 


HMI created by SBT (2022). 


CE 
EP Staston convo 


Preferred Tracking Class: 


O Head O Face O Body @ any 


Contolot: Setpoint: 
® vrecia 
m osi 
m 08 Fallback Options: 
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= Zoom: 0.05 JB Fallback (closest) 
Manual TrackiD change: 
Ente Tact smooth 
Controller Speed Tuning Gain 
O Pon Om © zoom 
——— 21 
Conor anne 
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3 10 = 10 130, 1.0, 9850, 557 0, 284.0, 3000, 1920.0, 10800, 00534) [3,1,2] Tue False 79 
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120 00 00 005 001 -20 20 True False None -0.408 00 20 4519725218 -0408 0.0534 


Figure 4: An overview of the interface. 


The interface module is based on the python package nicegui [3]. The interface 
features the current image and detections, as well as interactive selection of 
tracks. The interface allows to turn on/off detection and the controllers, to 
select tracks and adjust setpoints interactively, as well as to set fallback options 
and tune the controller online. 


5 Implementation and results 


The modules are programmed using Python 3.11. Communication between 
the modules is established by using UDP. Associations (persons) as well as the 
current tuning and settings are stored using an SQLite database. 


The system is currently subject to intensive tests. A current performance record 
is depicted in Fig. 5. Here, pan, tilt and zoom controllers are active, and control 
is induced by setpoint changes. 
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(a) Pan Control. (b) Tilt control. (c) Zoom control. 


Figure 5: Control performance. 


6 Discussion 


This paper presents an intelligent camera system for stage performances based 
on single PTZ camera. The motivation behind this system is to capture close- 
ups and dynamic shots, allowing for projection on a screen on large stages, 
and accommodating hybrid formats. The system aims to provide maximum 
freedom of movement for actors, dancers, and musicians. 


The camera system we developed here employs several perception methods 
based on recent advances in machine learning, first and foremost computer vi- 
sion. The perception module delivers objects and its associated tracks in terms 
of coordinates on the current frame. Given the target location (setpoints), PI 
controllers stabilize the errors. Non-linearities resulting from the I/O response 
and from trigonometry are addressed. 


Overall, the proposed system supports real-time, low-latency, and multi-camera 
setups. We handle approx. 40 frames per second, and the overall latency 
is approx. 80 ms. The system implementation utilizes APIs, deep learning 
frameworks, and interprocess communication for camera control, perception, 
and interface functionalities. Overall, this intelligent camera system offers 
an innovative solution for capturing stage performances with high precision, 
adaptability, and real-time capacity. 
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Abstract 


Condition monitoring is a key component of condition-based and predictive 
maintenance solutions and has applications in a wide range of industries. How- 
ever, extracting long-term asset condition information from process data is 
not a trivial process. The objective of this paper is to present the first steps 
in developing a condition monitoring solution using a hybrid modeling ap- 
proach. The paper provides an introduction to condition monitoring and hybrid 
modeling and focuses on the problem of calibration of first principles based 
simulation. Several possible approaches to model the calibration coefficients 
that vary during the process simulation were considered. Our results show that 
the developed piecewise constant approach, together with the tuned version of 
the Nelder-Mead optimization algorithm, allows to accelerate the calibration 
process without sacrificing the simulation error. 


1 Introduction 


The design life of equipment is often conservative because, in practice, actual 
operating and environmental conditions may differ significantly from those 
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considered in the design. Therefore, during operation, a remaining life assess- 
ment is required to determine the actual remaining life of critical equipment, 
which may be shorter or longer than the design life [4]. However, extracting 
long-term asset condition information from process data is not a trivial process. 
One possible solution could be to use a condition monitoring solution based 
on a hybrid modelling approach—a combination of a first-principles-based 
simulation model with machine learning algorithms. 


This article presents the results of an ongoing research project with an industry 
partner, and not all project details can be disclosed. The article is organized 
as follows. Section 2 provides an introduction to condition monitoring. The 
hybrid modeling approach and examples of its application to condition mon- 
itoring are presented in Section 3. Then, a problem of calibration of the first 
principles based simulation is introduced in Section 4. Section 5 presents a 
case study to demonstrate and test the developed model calibration approach. 
Finally, the conclusions are presented in Section 6. 


2 Condition Monitoring 


Condition Monitoring (CM) is the process of monitoring the condition of in- 
dustrial assets (manufacturing equipment, machinery, parts, auxiliary systems 
and components, etc.) during operation. Condition monitoring is a main part 
of condition-based and predictive maintenance solutions and has applications 
in a broad range of industries [1]. In general, the development of aCM solution 
consists of three main parts: data collection, data exploration and processing, 
and the development of a CM algorithm [2]. Depending on the industry and 
field of application, all three parts can vary significantly from solution to so- 
lution. The data source for CM can be either specially designed and installed 
sensors [3] or the existing infrastructure used for process monitoring [?]. The 
basic idea behind data-driven condition monitoring is that it is possible to 
extract some patterns and trends—condition indicators—from a large amount 
of collected data and infer the deterioration status of equipment for which there 
are not available or do not exist condition monitoring sensors. Data-driven 
condition monitoring relies on various data sources and types of measurements 
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acquired during equipment operation and uses various data mining techniques 
and algorithms [2]. 


3 Hybrid Modeling 


Hybrid modelling is a combination of two paradigms: first principles-based 
and data driven models into a single architecture (Fig. 1). First-principles 
(physics-based) models are based on formalized expert knowledge of a prob- 
lem, including design data, material properties, etc. In contrast, data-driven 
methods rely only on collected data. Hybrid modelling already has a portfolio 


Machine Learning 
and 
Data Analytics 


Hybrid First-principles 


Modeling Simulation 


Figure 1: Hybrid modeling is the fusion of two worlds: machine learning and first-principles based 
simulation. 


of applications in the process industry [6] and several examples of applications 
to condition monitoring. Leturiondo, et al. [8] use hybrid modeling to monitor 
the condition of rolling bearings. Gälvez, et al. [?] use hybrid modeling for 
condition monitoring of the heating, ventilation, and air conditioning (HVAC) 
system in passenger trains. In both works, the authors proceed from the hy- 
pothesis that, due to scheduled maintenance and service of equipment, the real 
measurements, collected by sensors located in the real system, contain very 
limited information about the degradation of elements, especially in the late 
stages of degradation. Therefore, in both studies, physics-based models are 
used to generate synthetic data for operation with known degradation levels 
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and equipment failures. In our study, we would like to propose and test a 
different way of condition monitoring by using a hybrid modelling approach. 
We assume that in real measurements it is possible to track the degradation 
process of the equipment by using detailed and calibrated physics-based sim- 
ulation of the process. The calibration coefficients of the physics-based model 
can possibly be used as condition indicators. For this purpose the process of 
physics-based model calibration should be relatively fast because we plan to 
trace the changes of physics-based model calibration coefficients during the 
whole lifetime of the equipment. If our hypothesis is successful, the next step 
would be to create a data-driven model that could extract condition indicators 
from process data without the use of simulation and optimization. 


4 Simulation Calibration 


One of the important steps in the development of a hybrid model is the calibra- 
tion of the physics-based model (simulation) to better match the industrial data 
(real measurements). Model calibration is the manual or automated process of 
estimating and adjusting model parameters (calibration coefficients) by fitting 
the model output to real data. 


The simulation calibration process consists of the following steps (Fig. 2). 
In the first step, the equipment design data and actual process data (mea- 
surements) are prepared and fed into the simulation. The second step is the 
execution of the process simulation. In the third step, the parameters calculated 
in the simulation are compared with the real measurements. The simulation 
error is calculated and passed to the optimization algorithm. The goal of 
the optimization algorithm in the fourth step is to minimize the simulation 
error by finding optimal calibration coefficients. The process is repeated until 
the desired accuracy is achieved or the number of simulations is exhausted. 
The choice of an effective optimization algorithm for model calibration is not 
obvious and requires considerable experience and some experimentation. 
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Figure 2: Flowchart of the Simulation Calibration Process 


4.1 Calibration Coefficients 


The process simulation has a set of calibration coefficients. From the available 
coefficients, we have selected two that could potentially be used as condition 
indicators. Since they are just coefficients without units, we can just label them 
as Calibration Coefficient 1 (CC1) and Calibration Coefficient 2 (CC2). The 
Calibration Coefficient 1 does not change during the one process simulation. 
The Calibration Coefficient 2, on the other hand, decreases during the process 
simulation (Fig. 3), but can be modeled as a constant for simplification. Mod- 
eling CC2 as a constant throughout the process simulation leads to oversimpli- 
fication and is suitable for approximate estimation, but doesn’t fit the purpose 
of what we are working on. 


Due to the complex behavior of the CC2, two other calibration approaches 
were considered. 


Approach 1 (exponentially decaying curve): In theory, the behavior of CC2 
can be modeled with an exponentially decaying curve (Fig. 3) described by the 
equation CC2 = A-exp(—7) +B, where t is the simulation time; A,B and T 
are unknown parameters. Unfortunately, attempts to determine optimal values 
for the parameters A,B and T have not been successful because this approach 
requires too many time-consuming simulations. 


Approach 2 (piecewise constant): Because of the difficulties we encountered in 
determining the coefficients of the exponential curve, we developed a different 
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Figure 3: Three ways to model CC2: 1-Constant throughout the simulation; 2-Exponentially 
decaying curve; 3-Piecewise constant 


approach. The simulation is divided into many small steps (overlapping win- 
dows) in which the CC2 is modeled as a constant. Then, using an optimization 
algorithm, the optimal value of the CC2 is calculated sequentially for each step. 
In this approach, the CC1 is optimized only on the first step. 


4.2 Optimization Algorithm 


The process of calibration of the simulation model can be considered as simulation- 
based optimization problem or Derivative-Free Optimization (DFO) problem. 
Simulation optimization is a very broad topic that involves the use of algo- 
rithms that come from many different fields, have connections to many differ- 
ent disciplines, and have been used in many practical applications [7]. DFO 
can be considered as a sub-field of simulation optimization. Most algorithms 

in DFO are specifically designed to consider that function evaluations or sim- 
ulations are expensive. 


The optimization objective is to minimize the difference between the real data 
and the simulated data. In our case, the Root Mean Square Error (RMSE) 
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between the simulated parameter and the actual (observed) measurement used 
as a cost function. Since every objective function evaluation requires a time- 
consuming process simulation and gradient information is not available, DFO 
algorithms are best suited for our task. The list of DFO algorithms built into the 
MATLAB environment we use, including the Statistics and Machine Learning 
Toolbox, the Optimization Toolbox, and the Global Optimization Toolbox, is 
given in Table 1. 


Table 1: List of DFO methods built into MATLAB and available in MATLAB Toolboxes 


Algorithm Matlab Function 
Nelder-Mead Simplex Method fminsearch 
Golden Section Search fminbnd 
Pattern Search patternsearch 
Surrogate Optimization surrogateopt 
Genetic Algorithm ga 
Particle Swarm Solver particleswarm 
Simulated Annealing simulannealbnd 
Bayesian Optimization bayesopt 


5 Experiment 


The goal of our work is not a detailed benchmarking of algorithms, but the 
selection of the most efficient optimization algorithm for our concrete problem. 
We formulated several requirements for the optimization algorithm: 


e The calibration coefficients have physical bounds, e.g., from 0% to 100%, 
and the simulation cannot take values outside these limits. For this 
reason, the optimization algorithm must be constrained. 


e Detailed simulation takes time, so the algorithm must use a limited num- 
ber of function evaluations (as few as possible). 


e Since we have several coefficients to optimize, the optimization algo- 
rithm should support multi-variable optimization. 


Proc. 33. Workshop Computational Intelligence, Berlin, 23.-24.11.2023 161 


The MATLAB software package contains several algorithms suitable for our 
problem. We compare three of them: a constrained version of the Nelder- 
Med simplex method, Pattern search and Bayesian optimization. Nelder-Mead 
and Pattern Search are widely used direct search methods [10]. In the MAT- 
LAB implementation, the Nelder-Mead algorithm does not support constraints. 
However, there is a popular community version of the algorithm with con- 
straints. Bayesian optimization is a global optimization algorithm recently 
become extremely popular for tuning hyperparameters in machine learning 
models [11]. 


5.1 Experiment Setup 


For the experiment, we used the industrial process simulation with a total 
duration of 85 days. According to the developed piecewise constant approach, 
the simulation is divided into 28 steps of 5 days each (window size is 5 days 
with an overlap of 2 days). All optimization algorithms are set to the same 
maximum number of iterations, 30 for the first step (window) and 15 for all 
subsequent steps. For the Nelder-Mead and Pattern Search algorithms, the 
starting point is the optimal solution from the previous step. The optimization 
constraints are the same for all algorithms. 


For the Nelder-Mead algorithm, we carried out two experiments, one with the 
default settings and the other with a modified tolerance for both the objective 
function value and the variable.The tolerance settings of the Pattern Search 
algorithm are equivalent to the tuned version of the Nelder-Mead algorithm, 
and for Bayesian Optimization, there are no tolerance settings. The settings of 
the experiments are summarized in the Table 2. 


5.2 Experiment Results 


All three optimization algorithms solved the optimization problem, but with a 
significantly different number of objective function evaluations and a slightly 
different RMSE values after optimization. Since we have 28 steps in our 
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Table 2: Experiment Setup 


Short Name Algorithm Tolerance 
NM, Nelder-Mead Default: le-4 
TolFun: 1e-03 
NM» Nelder-Mead TolX: 0.1 
Bayesian Does not support 


BO oar : 
Optimization tolerance settings 


Patten TolFun: 1e-03 
PS Starch StepTolerance: 0.1 
er MeshTolerance: 0.1 


proposed piecewise constant approach, the median RMSE value is used to com- 
pare the optimization algorithms. Figure 4 shows the number of evaluations of 
the objective function (simulations) that are used by each of the algorithms 
at each of the 28 steps. The Nelder-Mead algorithm with the default setting 
uses slightly more function evaluations than it was limited to use. However, 
by tuning the optimization tolerance, we were able to significantly reduce the 
number of function evaluations while keeping the median RMSE at the same 
level. This is illustrated in Figure 5, which shows the total number of function 
evaluations and the median RMSE value over 28 steps. The Pattern Search al- 
gorithm, with the same tolerance settings as the tuned Nelder-Mead algorithm, 
required more objective function evaluations and has a slightly worse RMSE 
values. As expected, Bayesian optimization without tolerance settings uses the 
entire limit of objective function evaluations, but does not show better RMSE 
values. The results of the experiment are summarized in Table 3. It should 
be noted that we also found that the constrained version of the Nelder-Mead 
algorithm can have convergence problems when the solution is very close to 
the boundary, which is not a problem for Bayesian optimization and Pattern 
search. The resulting values of CC2 after the optimization process are shown 
in Figure 6. 
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Table 3: Experiment Result 


Algorithm Total Number of Optimal Value Median 
Short Name Function Evaluations CCl Simulation Error 

NM, 462 1.087 0.0025 

NM» 218 1.087 0.0025 

BO 435 1.094 0.0028 

PS 291 1.100 0.0031 
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Figure 4: Number of function evaluations for each calibration step. Tuned version of the Nelder- 


Mead algorithm (NM, ) uses less objective function evaluations, especially in the first 
17 steps, than other optimization methods. 


6 Conclusion 


This paper presents the first steps in the development of a condition monitoring 
solution using hybrid modeling. In this phase, we considered several possible 
ways to model calibration coefficients that vary during the process simulation. 
To address this issue, the piecewise constant approach was developed and then 
tested with three different optimization algorithms and different optimization 
tolerance settings. The experimental results show that the simulation calibra- 
tion process can be significantly accelerated by tuning the tolerance parameter 
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Figure 6: Resulting CC2 values after calibration procedure. The values are only slightly different 
from each other, but the number of function evaluations that are used is completely 
different, see Fig. 5 


of the optimization algorithm without sacrificing the simulation error. The best 
results were obtained using the tuned version of the Nelder-Mead algorithm, 
but the optimal balance between optimization speed and simulation error needs 
to be further investigated. The next step is to use a developed simulation cali- 
bration approach to determine the potential of using the calibration coefficients 
as condition indicators in a condition monitoring solution. 
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Abstract 


Nowadays, Machine Learning (ML) is experiencing tremendous popularity 
that has never been seen before. The operationalization of ML models is 
governed by a set of concepts and methods referred to as Machine Learning 
Operations (MLOps). Nevertheless, researchers, as well as professionals, often 
focus more on the automation aspect and neglect the continuous deployment 
and monitoring aspects of MLOps. As a result, there is a lack of continuous 
learning through the flow of feedback from production to development, causing 
unexpected model deterioration over time due to concept drifts, particularly 
when dealing with scarce data. This work explores the complete application 
of MLOps in the context of scarce data analysis. The paper proposes a new 
holistic approach to enhance biomedical image analysis. Our method includes: 
a fingerprinting process that enables selecting the best models, datasets, and 
model development strategy relative to the image analysis task at hand; an auto- 
mated model development stage; and a continuous deployment and monitoring 
process to ensure continuous learning. For preliminary results, we perform a 
proof of concept for fingerprinting in microscopic image datasets. 
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1 Introduction 


In the field of image analysis, Machine Learning (ML), particularly its subfield 
Deep Learning (DL) is being explored to model complex problems such as 
image registration, classification, segmentation, object detection and tracking 
[1, 29]. In this context, the goal of ML is to build a model that generalizes 
across different images for the same analysis task, for example, image seg- 
mentation. Therefore, the research community focuses more on developing 
new methods that achieve better performances and are computationally more 
efficient [17]. While developing the ML model offline seems easy and cheap, 
operationalizing the model, which means, deploying the model and maintain- 
ing its performance over time, still faces numerous challenges [10]. Unlike 
standard non ML-based software whose operations include building, testing, 
deployment and monitoring, ML systems are more complex due to the two new 
components model and data, and their direct, and generally challenging rela- 
tionship. These systems are often a source of high technical debt when not op- 
erationalized consequently [28]. Similarly to standard non ML-based software 
whose lifecycle follows the Development and Operations (DevOps) scheme 
[3], ML systems have their development and operation paradigm known as 
Machine Learning Operations (MLOps)(see Section 2.1). 


Nevertheless, as shown by the multitude of approaches developed [17], it is 
very challenging to develop and operationalize one single model that gener- 
alizes across different images for the same task. Many experts develop new 
models for the same image analysis task when new input data is available. Due 
to the time and cost expensiveness of this process, they often focus more on 
developing the model and neglect the continuous monitoring and deployment 
aspect [16], resulting in a lack of continuous learning through the flow of 
feedback from production to development, and therefore to unexpected model 
deterioration over time [28]. Additionally, the initial training data in this field 
is often scarcity-prone in quantity and quality due to the tediousness of several 
tasks such as image acquisition and image annotation requiring skilled and 
expensive experts. This often leads to a dataset containing less relevant data 
from the experiments and more noise. 
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A potential solution is to create a production-oriented machine learning ap- 
proach that harnesses the full range of available and well-established datasets 
and models to improve the efficiency of image analysis across multiple tasks. 
This will be the main research goal of our work. In our paper, MLOps for 
sparse image analysis will be explored. We propose a long-term vision holistic 
approach to enhance biomedical image analysis that includes: a fingerprinting 
process that enables selecting the best models, datasets, and model develop- 
ment strategy relative to the task at hand, therefore making use of available 
models and datasets; an automated model development stage; and a continuous 
deployment and monitoring process. 


Our work is organized in the following manner. Section 2 explains the funda- 
mental concepts related to our work and provides an overview of other related 
works, and Section 3 details the proposed approach. In Section 4, the prelim- 
inary experiments carried out during the investigations are described. Their 
results are presented and discussed in Section 5. Section 6 summarizes our 
investigation and outlines future research directions. 


2 Background and Related Work 


In this section, we provide fundamental notions around MLOps and other im- 
portant fields related to our work. We also present state-of-the-art approaches 
related to ours in general and regarding the different building blocks in partic- 
ular. 


2.1 MLOps 


MLOps is a discipline that combines ML with software engineering paradigms 
such as DevOps and data engineering to enable efficient deployment and op- 
erationalization of ML systems [16, 30]. It can be seen to a certain extent as 
DevOps for ML systems. The key differences and similarities between DevOps 
and MLOps can be seen in Figure 1. On the one hand, both entail two main 
concepts Continuous Integration and Continuous Deployment. 
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Figure 1: DevOps vs. MLOps [12]. While DevOps entails applying the main concepts of 
Continuous Integration (CI) and Continuous Deployment (CD) on code only, MLOps 
has the two new components data and model, which are added to the code component. 
Additionally, MLOps has a new concept named Continuous Training (CT). 


e Continuous Integration (CI): consists in automatically building, testing 
and validating source code. 


e Continuous Deployment (CD): enables frequent release cycles by auto- 
matically deploying the software in production. It distinguishes itself 
from continuous delivery by automatizing the deployment process. 


On the other hand, in addition to normal source code, ML systems bring two 
new components: data and model, which are code-independent and therefore 
maintained separately from the code. This leads to the creation of pipelines 
to perform the complete processing. As a result, a novel concept specific 
to MLOps named Continuous Training (CT) emerges, which involves auto- 
matically retraining and serving the models. Furthermore, the CI and CD 
concepts in MLOps differ from those in DevOps in that CI does not only 
include integrating new code and components but also new data, models and 
pipelines, and CD does not deploy a single software package, but a complete 
ML training and/or serving pipeline. Additionally, continuous monitoring, 
i.e., automatically monitoring the IT system to detect potential problems such 
as compliance issues and security threads and address them, is extended in 
MLOps by monitoring production data and model performance metrics. This 
enables real-time understanding of model performances [32]. 
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Although MLOps is a relatively new field, important work and progress have 
been made. While some researchers focus on properly defining MLOps and 
providing an overview of its concepts and best practices [16, 30, 32], others 
investigate the challenges faced in ML systems operationalization. Tamburri 
highlights in [31] the limited attention given to MLOps within academia and 
the lack of data engineering skills in academia as well as in industry. Renggli et 
al.[25] and Granlund et al.[7] extend on this and show that one massive problem 
in MLOps is data management. This is mostly due to the strong dependency 
between model performance and data quality. The cloud architecture center of 
Google proposes in [2] how to automate ML workflows. They classify MLOps 
into three different levels: MLOps Level 0 that refers to classical ML pipelines 
with no CI, no CD and manually operated workflows totally disconnected from 
the ML system, MLOps Level | in which data validation, model validation, 
continuous delivery and CT are introduced, and MLOps Level 2 where CI, 
CT and continuous delivery are fully explored. They apply semi-automated 
deployment in a pre-production environment and manual deployment in the 
production environment. 


Several MLOps tools have been developed. A non-exhaustive list of tools and 
their features can be found in [16, 24, 30, 32]. There is no ideal tool, as each 
tool covers different MLOps aspects. In practice, tools are often combined 
to achieve maximal efficiency. However, the majority of tools focus on model 
versioning and tracking and ignore dataset versioning. This impedes the ability 
to reproduce results and renders it reliant on the coding practices of skilled 
experts [36]. In industry particularly, due to IT-Software’s large and complex 
nature, the MLOps tools are often diverse and must match a specific established 
strategy. 


Despite advancements in the MLOps domain, there are very few published 
real-world use cases in which MLOps is clearly designed, explained and ap- 
plied. One use case found is Oravizio [8], a medical software for evaluating 
the risks associated with joint replacement surgeries. The researchers had 
four different risk models from which the best was selected and deployed. 
One issue they faced during development was related to data management, 
due to the multitude of data formats they had to process. A second use case 
is SemML [38] in which the researchers propose a ML-based system that 
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leverages semantic technologies to enhance industrial condition monitoring 
for electric resistance welding. Their workflow enables them to reuse and 
enhance ML pipelines over time based on new input data. A third example 
is microbeSEG [26], a DL-based tool for instance segmentation of objects. 
The researchers automate several data management tasks and model building 
processes. They, however, do not focus on monitoring and deployment. 


We found two ongoing works connected to ours. Firstly, Friederich et al.explore 
in [4] Artificial Intelligence (AI) to encode dynamic processes. They propose 
a MLOps-based pipeline, in which active learning will be used to predict and 
record potentially important events during experiments in light microscopy. 
Secondly, Zarate et al.present K2E [36], a new approach to governing data and 
models. They investigate MLOps environments for creating, versioning and 
tracking datasets as well as models. They are still in the conceptualization step 
and plan to build their platform by following the Infrastructure as Code (IaC) 
paradigm. To the best of our knowledge, there is no MLOps approach similar 
to ours, particularly in the field of biomedical image analysis. 


2.2 AutoML and Meta-learning 


Automated Machine Learning (AutoML) is the process of automating different 
stages of the ML development cycle. These stages often include data prepro- 
cessing, feature engineering, model training and model evaluation. AutoML 
addresses problems such as Dynamic Algorithm Configuration (DAC), Hyper- 
parameter Optimization (HPO), Combined Algorithms Selection and Hyper- 
parameter Optimization (CASH) and Neural Architecture Search (NAS) [13]. 
Several researchers work towards building fully automated systems for their 
applications. For example, Meisenbacher et al.explain in [18] how to achieve 
AutoML for forecasting applications. They define five levels of automation for 
designing and operating forecasting models. 

Numerous tools have been developed, each addressing specific AutoML sub- 
problems. The developed tools often log metadata such as hyperparameters 
tried, pipelines configurations set, model evaluation results, learned weights 
and network architectures. Based on these experiences and other dataset meta- 
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data, a model is trained with the aim to adapt faster to new tasks [11]. This is 
meta-learning, also known as learning to learn. 


While it is clear that AutoML can be combined with MLOps to enable high 
level automation in the ML development lifecycle [2, 30], exploring the out- 
put of this combination for meta-learning seems not to be investigated. The 
information acquired during the continuous monitoring stages appears not to 
be widely researched in combination with AutoML or meta-learning. 


2.3 Scarce Image Data 


Scarce image data is a massive problem in ML. In the field of image analysis, 
particularly in biomedicine, the acquisition and annotation of images must be 
done by highly skilled experts, require a considerable amount of time and 
resources, and are often error-prone [27]. As a result of these constraints, the 
amount of image data present is either small in quantity, dominated by noise, 
or small in quality, very weakly or not labeled. A lot of approaches have been 
developed to address data scarcity. On the one hand, there are image processing 
techniques to augment image data such as scaling, rotation and cropping. On 
the other hand, DL methods such as Generative Adversarial Network (GAN) 
[6] and Variational Autoencoder (VAE) [14] are used to generate synthetic 
images. These DL approaches often use Self-Supervised Learning (SSL) to 
understand better how data points are sampled [22]. We notice however that 
these solutions often focus more on image classification tasks. 


2.4 Image Fingerprinting 


Fingerprinting is used in image processing to generate concise and distinct 
representations of images. It is useful for diverse objectives such as image 
retrieval, copyright protection, or image similarity analysis. We focus on image 
fingerprinting to measure the similarity between images and/or datasets. 


Ranging from simple pixel distribution methods [21] to DL approaches [15], 
there are numerous methods to compute similarity between images. Godau 
and Maier-Hein present in [5] an image fingerprinting approach that consists 
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of embedding images along with their labels in a fixed-length vector in order 
to capture semantic similarities in biomedical image datasets. Molina-Moreno 
et al.[22] build an autoencoder (AE), train this using SSL and obtain a two- 
dimensional latent space in which the disposal of image datasets displays their 
similarity. Such similarity measures enable researchers to apply transfer learn- 
ing for their respective tasks. This is often achieved by selecting suitable pre- 
trained models and/or datasets for a new task based on the similarity measure 
obtained. Nevertheless, most state-of-the-art methods compute this similarity 
on a dataset level or on an image level and do not investigate the computation 
on an image patch level. 


To fill the observed gaps in the context of image analysis, we propose a new 
approach, which is fully described in the following section. 


3 Methodology 


This section describes the proposed methodology to address the different prob- 
lems identified and mentioned in Section 1 and Section 2. We first provide 
a global description of the system and subsequently delve into its individual 
building blocks. 


3.1 Overview of the proposed approach 


Figure 2 shows a conceptual architecture of our method. The main enter- 
ing point is a scientist bringing a new image analysis task modeled as the 
triple(/,A,T,), where J represents the image dataset, A the performance analy- 
sis metric, e.g., Fl-Score, and T; the task, e.g., image classification. At a time 
t, the performance metric A; can be computed as defined in Equation (1), in 
which m; represents the current model, 7; the current image dataset and Yyp 
the metadata relative to J,. The following stages of our approach attempt to find 
the best model m* defined through Equation (2). 


A; = m;(Ta,t,Yupt)memM (1) 
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Figure 2: Abstract architecture of the proposed MLOps-based image analysis approach. The 
turquoise arrows indicate exchanges with the data database D and the brown arrows 
exchanges with the model database M. The black arrows model the information flow 
between different building boxes of the system and the dashed lines feedback from 
production to development and to the scientist. The orange box represents the input 
provided by the scientist. The red boxes are the meta-learning system that handles 
dataset and algorithm selection. The yellow boxes display the AutoML pipeline. The 
green boxes represent the continuous deployment and monitoring stage for production. 


m* (t) = argmax (A,) (2) 
meM 


3.2 Image Similarity 


At t = 0, the newly provided problem is initialized and registered in the image 
database D. The initial image dataset Jọ along with the task T4, the performance 
analysis metric Ag set to —co and other metadata Ymp o such as dataset ver- 
sion, Classes, data distribution, fingerprints, known performances of available 
models for the same task T4, etc. build an entry in D. In order to discover 
images similar to Jo, its fingerprints f;(Io) are subsequently computed, saved 
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and compared to those already present in D through a latent space embedding 
built in advance. This embedding will be periodically retrained to verify that 
the performance meets expectations continuously. 


As emphasized in [5, 22], with such fingerprints, deploying new ML models for 
biomedical applications can be seed up by selecting appropriate pre-training 
models. Furthermore, this could solve scarcity issues by finding appropriate 
images for image augmentation on the one hand and for pre-training on the 
other hand. 


3.3 Model Development Strategy 


Inspired by early attempts in [35] and [37], the model development strategy 
aims to leverage the concept of meta-learning for scarce data. Given the tuple 
(I,A,T,), the metadata Yyp acquired during the registration phase and the com- 
puted fingerprints f;(/), the goal is to find the top k(k > 1) cheapest and effi- 
cient model developing approaches. These approaches include both model and 
dataset and could range from model selection together with potential weights 
and development strategy, i.e., fine-tuning, retraining, to dataset selection or 
conception on the fly. Dataset conception in this context involves selecting the 
closest images to / in D and using these as training data. 


The reason we envision using not only f;(/) is to minimize error propagation 
when fingerprinting is inaccurate. In this case, the meta-learner will solely rely 
on the metadata Yyp. 


3.4 Automatic Model Development 


This stage is a direct application of the strategy established previously. It per- 
forms AutoML according to the strategy defined. Although AutoML is more 
computationally expensive than normal ML techniques, it is more efficient 
and faster in producing the best-trained model [2, 20]. It can be combined to 
MLOps to improve a project’s automation level, therefore reducing the pipeline 
configurations overhead. For example, in a retraining process or new training, 
it could help solve the HPO problem faster at time t. 
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All development runs, including failing approaches, will be tracked and recorded 
in the model database M. They would serve as input data for the meta-learner 
in the previous stage and may help identify flaws in the data or in the whole 
system. The best model m* is sent in the last stage. 


3.5 Continuous Deployment and Monitoring 


The best model developed m* is continuously deployed as a service and mon- 
itored during production. On the one side, we envision a deployment frame- 
work fulfilling fundamental requirements such as independency towards the 
ML Framework, rapid maintenance, accessibility, and parallel computing. De- 
spite the existence of numerous deployment frameworks, we will focus on 
Representational State Transfer (REST) frameworks such as DEEPaaS [?] and 
EasyMLServe [23]. On the other side, the performance metric A; as well 
as defined metrics by the scientist will be continuously monitored over time 
and reported. This monitoring will be particularly investigated, as it could 
help identify potential concept drift and decrease in performances, and act 
accordingly by triggering the complete framework from the start. 


The high information reuse, which will be investigated in this work, will serve 
the purpose of exploiting available feedback and enabling transfer learning. 
Our approach will mostly be done using the Python programming language, 
due to its rich ecosystem of data science libraries and its expansive and highly 
engaged user community. We also plan to apply containerization with Docker 
[19] and its microservices, as it guarantees platform independency and enforces 
reproducibility. 


4 Preliminary Experiments 


This section describes the current state of the investigation. The experiments 
performed mainly focus on the first stage of our approach described in Section 
3 which consists in building a latent space embedding in which image data 
can be represented along with their similarities. To this end, we build an 
autoencoder whose architecture will be depicted in Section 4.1, and evaluate 
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Figure 3: Autoencoder. The encoder is based on the ResNet18 architecture, the latent space 
representation is a two-dimensional space, and the decoder entails stack transpose 
convolutional layers. 


it on biomedical image datasets presented in Section 4.2. Our implementation 
can be found in [33]. 


4.1 Autoencoder 


An autoencoder is a special type of neural network that takes an input x, trans- 
forms it in a compressed and informative representation Xae and reconstructs 
the initial input based on x... The goal is to find an encoding function f : 
X Xae, and a decoding function, g : Xae > X such that the difference between 
the reconstructed input X and the initial input x is minimal. This difference is 
measured by a loss L : min, „L(x,g(f(x))). The goal is to obtain a powerful 
representation x,, that can be used for various tasks. 


The architecture of our autoencoder can be seen in Figure 3. Our encoder 
is a vanilla (standard) ResNet18 [9] based neural network. The final fully 
connected layer is replaced by a linear layer mapping the 512 feature channels 
vector to a two-dimensional vector. Because we are more interested in xge, we 
build a simple decoder that entails five stacked convolutional transpose layers. 
We choose the Mean Squared Error (MSE) as loss function that computes 
the difference between the original input image and the reconstructed output 
image, and Adam as optimizer. 
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Figure 4: MedMNIST v2 2D datasets 
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4.2 Datasets 


To evaluate our autoencoder, we select all the 2D image datasets of the MedM- 
NIST v2 [34] dataset, a benchmark dataset for 2D and 3D biomedical image 
classification. The 12 selected datasets can be found in Figure 4. They consist 
of eight gray-scale image datasets (BreastMNIST, ChestMNIST, OctMNIST, 
OrganaMNIST, OrgancMNIST, OrgansMNIST, PneumoniaMNIST and Tis- 
sueMNIST) and four color image datasets (BloodMNIST, DermaMNIST, Reti- 
naMNIST and PathMNIST). The autoencoder is trained, validated and tested 
on the respective datasets splits. All the images are processed as 3x 32 x 32 
images. For our architecture to be usable on all these datasets, the gray-scale 
images were loaded and processed as three-channel images. 


5 Results and Discussions 


We present in this section the results of the experiments performed in the 
previous section and discuss these. 


Figure 5 shows the latent space representation of N = 10000 test samples 
collected from the 12 2D image datasets presented in the previous section. We 
notice that the color image datasets BloodMNIST, DermaMNIST and PathM- 
NIST build a cluster at the top left. This is most likely due to the close 
distribution of the pixel values of the respective images. These color image 
datasets are, however dissimilar to the color image dataset RetinaMNIST, in 
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Figure 5: Latent space representation of the MedMNIST v2 2D datasets (test sets) 


which the images of the retina all have a black contour. The OrganaMNIST, 
OrgancMNIST and OrgansMNIST datasets are mixed and almost not differ- 
entiable, leading to a non-identification of particular structures in the latent 
space. This is because all three datasets have images of the same organ taken 
on different views. In OrganaMNIST, the images are acquired on an axial view, 
in OrgancMNIST in a coronal, and in OrgansMNIST in a sagittal view. Nev- 
ertheless, all three datasets have multiple outliers that may hinder the model in 
positioning new incoming images. 


A better view of the latent space is provided in Figure 6, in which the mean 
of all embedding vectors of the test images are computed for each dataset. 
We notice four main clusters. The first cluster entails the three-channel im- 
age datasets BloodMNIST, DermaMNIST and PathMNIST, the second cluster 
in which OrganaMNIST, OrgancMNIST and OrgansMNIST are very close, 
as well as PneumoniaMNIST and ChestMNIST. The third cluster consists of 
BreastMNIST and RetinaMNIST, and the final cluster of TissueMNIST and 
OctMNIST. 


This autoencoder shows a promising ability to capture similarities between im- 
ages. Despite being in the early stages of our investigation, we strongly believe 
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Figure 6: Mean value latent space representation of the MedMNIST v2 2D datasets (test sets) 


that this approach to solving the image similarity question is encouraging and 
could be fine-tuned to identify specific structures in image crops. 


6 Conclusion and Future Work 


In this paper, we presented a new approach to improve biomedical image anal- 
ysis. Our approach aimed to apply MLOps to image analysis tasks, particularly 
when the image dataset is scarce. To achieve this goal, we presented a multi- 
stage framework that leverages the existence of models and benchmark datasets 
to solve a given task. The first stage enables us to find image datasets similar 
to the images of the given task. The second stage applies meta-learning to 
select the best model development strategy for the given task. This strategy 
is executed via the third stage of AutoML. The final stage deploys the best- 
trained model and monitors the model’s performance continuously to achieve 
optimal performance. 


The preliminary experiments carried out in this paper mostly focused on the 
first stage, which consists in computing image fingerprints to identify similar 
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datasets. We built a ResNet18-based autoencoder and showed that the result- 
ing 2D latent space representation is interpretable enough to find similarities 
between images. We, however, faced some challenges when the images are all 
from the same object but taken from different angles and focused only on 2D 
image datasets. 


Our future research will, therefore, focus on extending the image fingerprint- 
ing process to 3D image datasets and even to the level of image patches and 
improving the representation of different artifacts. Understanding the image 
at this granularity level would enable us to generate data or labels to solve the 
data scarcity problem. We also plan to investigate the effects of outliers, as they 
may highly impact the similarity measurements. Finally, we will continue our 
research on the other stages of the proposed approach, including meta-learning 
for biomedical image analysis, AutoML and efficient model deployment and 
monitoring. 
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1 Introduction 


The logistics industry plays a pivotal role in global trade, and efficient ware- 
house operations are essential for the seamless movement of goods. In recent 
years, prevailing job market conditions have presented significant difficulties 
in recruiting skilled workers in warehouse operations [1]. This shortage of 
skilled logistics workers has challenged companies to meet dynamic market 
demands and hindered effective workforce planning. The labor shortage results 
in increased operating costs and decreased overall efficiency due to suboptimal 
resource utilization. The key challenge for warehouse managers is the fluc- 
tuating and unpredictable nature of customer demand. Short-term customer 
orders, seasonal fluctuations, and rapidly changing market demands make it 
difficult for companies to forecast and plan their logistics workforce accurately. 
These dynamic demands often result in overstaffing during off-peak periods 
and understaffing during peak periods. Understaffing leads to unmet customer 
demand and, in some cases, customer churn. It also runs counter to a common 
warehouse goal of maximizing service levels (i.e., the promise of fast and ac- 
curate delivery) as a measure of differentiation from competitors. Overstaffing 
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leads to underutilization and inefficiencies. This results in financial losses, as 
customer order fulfillment through picking at the point of delivery is the most 
costly activity in the warehouse [2, 3, 4]. 


To tackle the pressing issues in workforce scheduling, it is essential to develop 
accurate forecasting frameworks that can efficiently predict delivery positions. 
Accurate forecasting enables companies to optimize their employee staffing 
in logistics, reducing the risks of both overstaffing and understaffing within 
the limited workforce. Currently, the forecast process is heavily reliant on the 
personal expertise and judgment of individual team members. These individu- 
als draw on their years of experience and intuition to estimate future demand, 
utilizing the number of pre-orders already recorded in their software system 
as a key input. This approach, although common in small and medium-sized 
warehouses, is inherently subjective and susceptible to human error and bias. 


In this context, integrating Machine Learning (ML) methods presents a promis- 
ing solution to enhance workforce scheduling efficiency [5, 6]. ML algorithms 
offer the capability to process vast amounts of data, identify complex patterns, 
and make data-driven predictions. By reframing the workforce optimization 
problem as a forecasting challenge, ML models can be leveraged to provide ac- 
curate and reliable one-day-ahead delivery position predictions. This approach 
not only improves transparency and explainability in the estimation process but 
also enhances the overall scheduling efficiency. 


2 Problem Statement and Data Description 


In the complex domain of workforce planning within logistics, we aim to 
develop an accurate mathematical representation of the challenge. Instead of 
directly tackling workforce scheduling, we’ve transformed it into a prediction 
problem, using various ML models to forecast the next day’s delivery positions. 
The problem is defined mathematically as minimizing the root-mean-square 
error between the actual and predicted number of delivery positions using the 
optimal model and its parameters. 
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1:2 : 
i* = argmin, i £ (Det — ppredty2 (1) 
n ZI 


With 
¢ Dé“: The actual number of delivery positions for day t. 


° DP redi. The predicted number of delivery positions for day t, using the 
i-th ML model. 


Simultaneously, we aim for the predicted delivery positions, D? renis from 
the best model to closely match the needed number of workers, W;. The 
relationship between these is determined by past company data and workforce 
planning rules. This relationship is given by 


W, = n(DP (2) 


where h(-) represents the historical link between delivery positions and re- 
quired workforce. Nonetheless, our focus is on pinpointing the most effective 
ML technique, rather than deducing the exact form of h(-). The study relies on 
delivery position data from a collaboration between the Technical University of 
Cologne and a major electrical engineering firm, with adjustments made using 
a constant factor X to safeguard financial confidentiality. Table 1 provides a 
comprehensive statistical overview of the initial time series data. 


The actual data entries amount to 1337 data points, given the non-operational 
days like weekends and winter breaks. Following an 80/20 split for parti- 
tioning, the training set contains 1069 entries, and the test set has 268. The 
initial data inspection highlighted significant downward fluctuations, leading 
to the removal of entries identified as outliers and those with fewer than 200 
delivery positions. This refined dataset, now with 1321 entries, presents a 
more consistent distribution conducive to analysis. Cleaning paved the way 
for feature engineering, introducing variables like time details, lag intervals, 
and multiple rolling means to serve as inputs for the ML models. 
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Table 1: Feature summary 


Start Date End Date Time Span 


Lagerbewegung Zeitpunkt 2018-01-02 2023-05-08 1952 days 
Mean Std. Min Max 
Lagerbewegung Pick ERP-Auftrag 2860.26 697.91 1 4525 


3 Methods & Metrics 


For forecasting, five primary models were utilized: Random Forest, XGBoost, 
LightGBM, Support Vector Regression (SVR), and Convolutional Neural Net- 
works (CNN). Random Forest is an ensemble method known for its robustness 
and ability to manage non-linear relationships [7]. XGBoost and LightGBM 
are both gradient boosting frameworks, with the former being praised for its 
speed and flexibility [8], and the latter for efficiency and leaf-wise tree growth 
[9]. SVR excels in high-dimensional spaces and offers kernel function flexibil- 
ity [10]. CNNs, while dominant in image classification, have shown prowess in 
time-series forecasting, capturing both local and global temporal dependencies 
[11]. 


Beyond the primary models, two ensemble strategies, Stacking and Averaging, 
were evaluated. Averaging involves computing the mean of individual model 
predictions, noted for its robustness [12]. Stacking harnesses multiple base 
models, in our case leveraging ridge regression as the meta-model [13]. 


Hyperparameter tuning for all models was automated using the Optuna pack- 
age, streamlining optimal value discovery [14]. 


Model performance was gauged using Root Mean Square Error (RMSE) and 
Mean Absolute Error (MAE). While RMSE emphasizes large errors, offering a 
penalty, MAE offers a balanced view on error magnitude [15, 16]. Both metrics 
help in a holistic evaluation of model accuracy. 
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Table 2: Performance metric of the different modeling approaches 


Algorithm MAE RMSE 
LightGBM 335.56 440.63 
CNN 382.83 518.00 
SVR 356.51 470.66 
XGBoost 331.86 437.15 
Random Forest 323.78 429.55 
Stacking Ensemble with CNN 683.47 796.16 
Average Ensemble with CNN 318.91 416.48 
Stacking Ensemble without CNN 365.08 476.35 
Average Ensemble without CNN 313.98 413.96 
4 Results 


In evaluating the base models, tree-based algorithms - LightGBM, XGBoost, 
and Random Forest - showed close performance, with RMSEs between 429 to 
441 and MAEs ranging from 323 to 336. The SVR demonstrated a conservative 
forecast, with MAE and RMSE values at 356.51 and 470.66, respectively. Con- 
versely, the CNN trailed in performance, especially post-Christmas, resulting 
in an MAE of 382.83 and an RMSE of 518. 


The ensemble strategies exhibited contrasting results. The stacking ensemble, 
despite capturing the time series’ structure, tended to overestimate with MAEs 
and RMSEs at 683.47 and 796.16. Contrarily, the average ensemble outper- 
formed all models, achieving an MAE of 318.91 and an RMSE of 416.48. 


Considering the subpar performance of the CNN, ensembles were recomputed 
without CNN inputs. This led to an overall improvement. The average ensem- 
ble showed an MAE of 313.98 and an RMSE of 413.96, while the stacking 
ensemble registered 365.08 and 476.35 for MAE and RMSE, respectively. 
Even though the average ensemble’s metrics were superior, the stacked method 
better captured individual peaks. 


The performance metrics for all the discussed modeling approaches are sum- 
marized below: 
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5 Discussion & Outlook 


Besides confirming a fundamental predictability in the used dataset, this work 
also showed that tree-based models like Random Forests, XGBoost and Light- 
GBM are suitable algorithms for this task. Beyond that, stacking and averaging 
ensemble methods using these models proved to further increase the forecast- 
ing effectiveness. 


While the SVM approach performed slightly worse but in the same approx- 
imate range as the tree-based models, CNNs clearly showed inferior results, 
especially when exposed to irregularities like the post-christmas dip in the used 
dataset. The small size of the dataset can be assumed as a possible reason for 
this lack in performance. The neural network might not have been exposed to a 
sufficient amount of data to pick up the more complex patterns. As the dataset 
grows over time, we could expect an improvement in CNNs’ performance. 


Going forward with these approaches, a possible next step is the incorporation 
of external variables that might influence the number of delivery positions. 
These variables can range from calendrical data to pre-order information and 
even weather data. 


Furthermore, different model architectures should be examined, as this work 
mainly focused on tree-based models. Diversifying the model architectures 
could boost the forecasting effectiveness even further. 


In summary, this research offers a promising start toward optimizing workforce 
scheduling in small and medium-sized warehouses in the logistics sector by 
leveraging ML methods. 
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1 Introduction 


The aim of this paper is to investigate and discuss the use of generative neural 
networks to reconstruct handcrafted adaptive music. We choose the dynamic 
compositions from the 1991 video game Monkey Island 2 [1], which has con- 
sistently been recognized as a role model and masterpiece in this field [2, 3]. 
The music in this game is generated in real-time during gameplay, leading us 
to define our task as the successive prediction of note events using autoregres- 
sive models. Furthermore, the game’s music adapts to player actions, so the 
generative task additionally is conditional. 


In Section 2, we explain that music data contains very different types of in- 
formation and show how similarly constructed statistical errors can have very 
different effects according to cognitive music perception, which brings addi- 
tional complexity to this problem domain. In Section 3, we explain how the 
dataset was obtained and what attributes it has. The dataset generator itself is 
openly available! for further academic or educational use. We argue that, on 
the one hand, music generation is a fun problem domain without ethical issues, 
which has the potential to motivate people to engage with AI technology. On 


'https://github.com/fabianostermann/WoodtickWalkingSimulator 
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the other hand, it really makes a challenging benchmark for complex multi- 
task learning concepts and multi-objective optimization strategies. Moreover, 
coping with the adaptive real-time aspect of video game music is exceptionally 
difficult and, to our knowledge, poses a unique challenge. For our practical 
investigation, Section 4, we define seven problem dimensions specific to this 
task. For each dimension, we conduct experiments and discuss the unique 
challenges associated with processing music data. In the process, we propose 
areas for further investigation, which will be summarized in Section 5. 


2 Music as a problem domain 


Music is a complex matter. Its description needs a lot of dimensions [4], 
which makes it vulnerable to combinatorial explosion. All attempts of sym- 
bolic representation, like modern western notation, need some form of severe 
simplification.” The example of MIDI is used to explain which minimum set 
of dimensions is required. 


MIDI [5] is a widely used music data communication protocol that was also 
used for the music of Monkey Island 2. MIDI was designed as a standard to 
connect all kinds of electronic musical instruments. It defines atomic musical 
instructions called MIDI events that carry byte-sized information about pitch, 


timing and loudness? 


. The beginning and the end of a note are divided into 
two different events. The desired instrument sound is implicitly set. The MIDI 
stream consists of 16 different channels. A so-called program change event is 
sent to a channel to request an instrument change. A special case is channel 
10, which is used for drums, and where the pitch information is remapped to 


specify which drum instrument to trigger (e.g., snare, bass drum, hihat). 


MIDI and its cluttered specification appears a bit dated by today’s standards. 
Its main advantage is its wide popularity. A lot of MIDI-encoded music exists 
and can be directly used to build machine learning datasets [6]. However, it is 


2 An exception may be the approach to process music at the audio signal level [7, 8]. 

3 Be aware that the loudness of a note in the context of MIDI usually is referred to as velocity, 
because the velocity of pressing down a key on a piano keyboard determines its loudness. We 
stick to the term loudness here since velocity can easily be misinterpreted as a temporal property. 
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usually not practical to use MIDI events directly as input data for classification 
or generation tasks.* Therefore, it is necessary to convert a stream of MIDI 
events into a sequence of semantic features. 


We choose to define every note as a 5-tuple 


note; = (Atime, duration, loudness, pitch, instrument) (1) 


and a music sequence of length n as seg = {note ,note2,...,note,}. The time 
difference Atime is calculated as the distance in seconds to the previous note 
event. The duration is the distance of beginning and end of the note in seconds. 
The loudness is the MIDI velocity value normalized to [0,1]. The pitch is one 
of {0,1,..., 127} which linearly maps to the keys of a piano, where 60 denotes 
the middle C. The instrument is an integer label determined by searching in 
the MIDI stream for the last program change event that occurred on the same 
MIDI channel.’ 


Note that this scheme is one possibility of many and that the concrete encoding 
is usually important [9, 10]. Also note that the different types of information 
are not equally important to the cognitive perception of the music, since, e.g., 
Atimecan be allowed to vary by a few milliseconds without the perception of 
rhythmical flaws while missing a pitch by only a semitone® instantly results in 
severe harmonic dissonances. Changes in duration, loudness or instrument are 
rather perceived as variations than mistakes. Among all, Atimeand pitch are 
(by far) the most important features to re-recognizing music. 


That said, music makes a complex multi-objective learning problem that mixes 
regression (Atime, duration, loudness) and classification (pitch, instrument) 
tasks. Therefore, it can be approached by multi-task learning, which was al- 
ready applied to music with the most success in music transcription [11, 12]. 


4 For example, we tested direct prediction of byte events on our training data. It resulted in 
accuracy of 90%-+, which, however, is not at all useful for actual generation since the smallest 
mistake leads to fatal syntax corruption. 

5 The actual code implementation makes use of the python package prettymidi [13] 

6 A semitone is the smallest possible distance between two different tones in western tonal music. 
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3 Dataset of adaptive video game music 


The 5-tuple encoding above can be applied to all music of information depth 
equal to simple MIDI. For the scope of our study, we decided to learn from 
data that consist of the adaptive game music that was originally composed and 
programmed by Michael Land and Peter McConnell for the all-time adventure 
game classic Monkey Island 2 [1]. Despite the game’s release in 1991, the 
significant effort invested in creating the adaptive compositions, including the 
manual crafting of musical transitions between numerous points in the music 
variations, remains extraordinary. It continues to serve as a role model in the 
eyes of experts and critics until today [2, 3].’ 


We selected a specific piece of music from the game: In the town known as 
Woodtick, which the main character visits at the beginning of the game, the 
music changes as the character enters different locations. But the music is not 
just replaced or blended over. Instead, variations of the town music smoothly 
appear in various manually-prepared dynamic transitions. We have developed 
a random walking simulator for the game, which is openly available®. It auto- 
matically walks the main character through Woodtick and records the stream 
of MIDI events using the ScummVM emulator [14]. We also track the location 
change events that trigger the musical transitions in the MIDI stream. 


Since a note from a music sequence in close scope depends linearly on its 
previous notes, the process of creating the sequence can be modeled as an 
autoregressive task. However, in a wider scope, changing the location leads to 
distinct musical continuations. Therefore, the event of changing the location 
must be considered as a conditional input c; for the autoregressive model 


note; = f (note;—1,note;—2,...,note;—p,Cr), (2) 


7 The reason is interesting: The simple MIDI protocol was dropped when waveform sample 
playback became possible on home computers. This leap in acoustic complexity made the 
possibility of handcrafting complex adaptive music compositions impossible [2]. The game 
industry opted for the cinematic feel at the expense of musical immersion. And this trend 
continues to this day, with each increase in computing power primarily used for better and 
better game graphics. However, this circumstance is likely to change soon, as the power of 
home computers is increasing faster and faster due to current improvements in AI technology 
[15], which could reach a limit of necessary realism (as it was for 4k screen resolution). 

8 https: //github.com/fabianostermann/WoodtickWalkingSimulator 
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where p is the context length, i.e. the number of considered previous events. 
We will use a neural network to map the function f. 


The big difference to models usually applied to music generation is that we are 
not trying to create a general framework for diverse musical outcomes [16, 8]. 
Instead, we want to approximate a single composition. However, since this 
composition is adaptive, it differs every time it is played. But the number 
of possible variations is limited for a finite period of playback. That means, 
although the model is autoregressive as described, it does not need to generalize 
for all possible inputs. 


4 Problem dimensions 


In this section, we present seven different problem dimensions of our task: 
neural network architecture and type, context length, data input representa- 
tion, multi-task learning, complexity reduction, multi-objective optimization 
and rare events. This list of seven is not exhaustive, as there are additional 
parameters, e.g., batch size, learning rate or the choice of the optimizer itself.’ 
However, we chose to focus on discussing these seven dimensions because we 
found them to be less universal and, in some aspects, unique to the context of 
music data. We will present some practical experiments on how to optimize 
parameters for each of the dimensions. Given their interdependency, where 
changes in one parameter can consistently affect results in other problem di- 
mensions, an all-at-once optimization approach would be ideal. However, since 
this is beyond the scope of complexity, we decided to propose a chain of opti- 
mizations knowing that changing their order may influence final results. Notice 
that it was also not possible to exhaustively optimize each dimension. We 
will present exemplary results, discuss how each problem dimension should be 
addressed, name applicable strategies and name aspects for future investiga- 
tions. 


To optimize the multi-task models, we used an averaging loss function as 
the overall mean of cross entropy loss for pitch and instrument and mean 


9 We choose a batch size of 4096, l, = 0.001 and the Adam optimizer [17]. 
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Table 1: Loss and accuracy score comparison of different neural architectures and types 


recurrent 
type layers units | dense loss + accuracy 
LSTM 2 256 |- 1.081+0.01 0.807+0.03 
LSTM 1 256 | 128 1.105+0.01 0.712+0.01 
LSTM 1 256 |- 1.107+0.01 0.706+0.02 
GRU 2 256 |- 1.147+0.00 0.693+0.02 
GRU 1 256 | 128 1.11740.01 0.657+0.01 
GRU 1 256 |- 1.178+0.01  0.565+0.02 


absolute error loss for the regression tasks of Atime, duration and loudness. 
We used the latter instead of mean squared error to mitigate averaging issues 
associated with large magnitude differences. For evaluation, we will mainly 
provide accuracy score. For Atime, duration and loudness, we simply defined 
a deviation of less than 0.05 as sufficiently accurate.!° Due to random weight 
initialization, we conducted 5 statistical repetitions on 5 different simulator 
runs, each consisting of 2 hours of music. Every loss and accuracy score below 


is reported as mean + standard deviation of these 5 runs. 


4.1 Architectures and neural network types 


The first problem when coping with neural networks is always to choose a 
network type and to determine its structure and size. There are a few general 
tules to follow [18], but most parameters must be reconsidered with every new 
problem. Since we have time series data, the natural choice is to use recurrent 
networks. We chose the popular long-short term memory (LSTM) unit [19] 
and compare with its less complex but more efficient variant gated reccurent 
unit (GRU) [20]. 


Table 1 shows that an LSTM with 2 layers is performing best for accuracy and 
loss. A concatenated dense layer helped in case of only one layer. The same 
applies to GRU, which in comparison performed worse. Note that using only 


10 For Atime and duration 0.05 equals to 50 ms. 
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Figure 1: Accuracy score, loss and training time in relation to context length. Circle markers 
correspond to the left y-axis, cross markers to the right. 


the last hidden state of the LSTM gave similar results to GRU, but using hidden 
state and last cell state improved the results significantly. 


We can clearly see that the choice of model type and architecture is, as ex- 
pected, of critical importance. Another approach could be to use 1D convolu- 
tional filters. State-of-the-art autoregressive modeling likely involves the use 
of transformer models [21, 9], the application of which remains future work. 


4.2 Context length 


The number of considered previous events is a crucial parameter for autore- 
gressive tasks. It determines, how much context information the model has 
to predict the next event. For this problem dimension, we used the 2 layered 
LSTM, which performed best in the previous section. Figure 1 shows, that for 
already 8 events the accuracy reaches a plateau. After that, it only improves 
slightly. However, the time needed for calculation increases further since the 
number of weight parameters inside the LSTM increases non-linearly. With 
p = 16, the calculation needs about one third more time than with p = 8.!! In 


11 34 min to 26 min on a Nvidia A100 GPU. 
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order to have more capacity for diverse experiments on the following dimen- 
sions, we decided to use p = 8 from here on. Please note that a sequence length 
of p = 8 in music holds much information (cf. Eq.1). Just imagine, if someone 
sings to you 8 notes of a well-known melody, you will probably be able to 
recognize it. 


4.3 Input representation 


The internal representation of data can have great influence on the model per- 
formance [22]. The categorical information about pitch and instrument before 
was one-hot encoded with ne = 60 unique classes for pitch and ne = 13 differ- 
ent instruments. With a sequence length of p = 8 and the 3 other tasks there 
were already 8 - (60+ 13 +3) = 608 input values. As the model complexity 
is relative to the input size, we aimed to decrease the number of inputs to the 
LSTM by adding an input embedding layer [18]. The embedding size was de- 
termined as | | + 1. This definition can be considered another hyperparameter 
to be optimized. 


By applying input embedding, we increased the accuracy from 0.8070.03 to 


0.867=0.02 with a decreasing loss from 1.081+0.01 to 0.867+0.02. A further 
t12 


approach to directly use the integer label as input ^, which reduces the input 


size to 1, performed worse with an accuracy of just 0.716+0.02. 


4.4 Multi-task learning vs. ensemble learning 


To learn all objectives separately and predict them with an ensemble of models 
improves evaluation results [23, 18] but comes at a higher computational cost. 
If not calculated in parallel, it is far slower in inference (and training, cf. Ta- 
ble 2), which might cause severe problems when used for real-time conditioned 


generation or on low performance hardware. !? 


12 For pitches, this may be called an ordinal encoding, since it provides a natural order. 
'3 Regarding the present context, both requirements are met for video games. In addition, 
computational resources are oftentimes reserved for rendering high-resolution 3D graphics. 
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Table 2: Loss and accuracy scores for different ensemble and multi-task models. Values 
belonging to one individual model share a frame. 


accuracy 
Atime | 0.999+0.00 | 0.955+0.01 | 0.945+0.01 
duration | 0.980+0.00 | 0.876+0.01 | 0.862+0.02 
loudness | 0.975+0.00 | 0.977+0.00 | 0.880+0.03 
pitch | 0.907+0.01 | 0.670+0.02 | 0.696+0.03 
instr. | 0.950+0.00 | 0.9530.01 | 0.954+0.00 
mean | 0.9620.00 | 0.8860.00 | 0.867+0.02 
¥ training time 106 min 43 min 29 min 


Table 2 shows that learning single tasks (left column with numbers) is much 
easier than learning multiple tasks at once (right column). Learning the re- 
gression tasks (Atime, duration, loudness) or the categorical task separately 
(middle column) only improves scores slightly (except for loudness). However, 
handling this interdependency is a major topic. E.g., the choice of the next pitch 
is highly dependent on which instrument is chosen. Before, they were predicted 
in parallel, but the accuracy may be improved by using an hierarchical learning 
approach to predict the objectives one after another. 


4.5 Complexity reduction 


This dimension is to be understood as the output equivalent of the input rep- 
resentation from Section 4.3. As Table 2 shows, predicting pitch is the most 
difficult objective. That probably is because it has the most classes and the 
largest dependencies to the previous notes. In addition, only a near 100% 
accurate prediction is satisfactory from a psychological point of view. When 
the timing is slightly off or the melody is played by the wrong instrument, 
the mistake is not perceived as serious as that of choosing the wrong pitch.'* 
In this context, leveraging the semantic nature of music data can be used to 
further reduce complexity. The pitch information can be divided into two bits 


14 An exception is the drums, because when playing back a melody on the MIDI drum channel, 
it becomes completely unrecognizable. Hence, another approach to complexity reduction could 
involve segregating the prediction of drum instruments from melody and harmony instruments. 
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Table 3: Accuracy scores for predicting pitch divided into pitch class and octave. 


pitch | Atime duration loudness instrument | mean 


0.696+0.03 0.945+0.01 0.862+0.02 0.880+0.03 0.954+0.00|0.867+0.02 
0.802+0.01 0.908+0.01 | 0.900+0.02 0.795+0.04 0.817+0.04 0.9530.00]0.862+0.02 


pitch class octave | | 


of information: octave and pitch class. This introduces an additional task to the 
multi-task model but has the advantage that predicting one of 12 pitch classes 
can be learned with a higher accuracy. In addition, playing in the wrong octaves 
(of 5) is not perceived as nearly as disharmonic as any smaller semitone error.!> 
Table 3 shows a boost in overall pitch accuracy by splitting up the task. But 
the multi-task difficulty increases and thus all the single regression tasks drop 


in accuracy. However, the average accuracy remains unchanged. 


Another approach of complexity reduction is to transform each linear regres- 
sion task into a classification task by binning or clustering. This might be 
advantageous since continuous regression is at times more difficult, especially 
when combined with a classification task in a multi-task learning problem 
(cf. Section 4.4). 


A more complex approach may be to encode segments of the music with a 
(variational) autoencoder that is then controlled by another agent that is re- 
warded to recreate the composition in a reinforcement learning setting. More- 
over, this approach would be able to come up with some novel variations of the 
music. 


4.6 Multi-objective optimization 


Since the present task is inherently multi-objective, we can also consider to 
improve the loss function itself. Up to here, the loss value was calculated as 
the equally-weighted mean of all single loss values of all the objectives. Since 
pitch was identified to be the hardest but most important objective, we try to 


!5 The auditory phenomenon of tonal fusion [24] explains that two notes played in the interval of 
one octave are most likely to be perceived as a single tone among all possible intervals. 
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Table 4: Accuracy scores of different prioritization of pitch under weight wp. 


Wp pitch Atime duration loudness instrument mean 
0.0 0.017+0.00|0.859+0.03 0.735+0.04 0.778+0.04 0.950+0.00 | 0.6680.02 
0.2 (mean) | 0.681+0.02 |0.949+0.01 0.869+0.01 0.885+0.01 0.955+0.00|0.868+0.01 
0.5 0.613+0.02|0.757+0.04 0.614+0.02 0.670+0.03 0.950+0.00 /0.721 +0.02 
0.8 0.616+0.03 |0.579+0.01 0.451+0.03 0.436+0.04 0.948+0.00 | 0.606+0.02 
1.0 0.629+0.02!0.053+0.01 0.051+0.01 0.038+0.02 0.951+0.01 |0.344+0.00 
random 0.605+0.02|0.802+0.02 0.679+0.05 0.603+0.05 0.955+0.00|0.729+0.01 


determine if it is possible to boost its accuracy by prioritizing it over the other 
objectives. The following formular will be used for prioritizing objective p 
with weight w, over all the other n objectives: 


n ]— 
loss = Wp: lossy + eee) -lossi (3) 


i=l 
For wp = 0.2, this formula corresponds to the average weighting used before. 


Table 4 shows that the applied weighting procedure is not able to boost the 
accuracy of the pitch prediction. However, a w, of 0.0 prevents any learning 
success for pitch, but surprisingly the other objectives do not benefit from 
it. When increasing w, to 1.0, the other objective drop to an accuracy of 
nearly O as expected (except instrument). The pitch objective, however, does 
not benefit from this either, since its accuracy decreases in comparison to the 
equal weighting (wp = 0.2). The baseline comparison to a random weighting, 
that randomly changes w, on each call, also shows no improvement. 


In total, weighting for prioritization was not successful, since equally weight- 
ing did indead perform better. This result may be explained by destabilizing the 
gradient descent. If so, parameters like batch size and learning rate must also 
be reconsidered here. In any case, this topic is complex and worth investigating 
in future work, e.g. by applying Chebyshev loss or other strategies from the 
domain of multi-objective optimization. 
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4.7 Rare events 


Music compositions typically exhibit numerous redundancies, such as repeti- 
tions, reprises, and motivic variations. As a result, music data often is imbal- 
anced. In the simulation, we observe that the main character spends roughly 
half of the time outside in the town scene (due to random walking), leading 
to a significantly higher presence of the corresponding music. Furthermore, 
location changes account for only a maximum of 5% of the data, resulting 
in a severe underrepresentation of individual musical transitions, which are 
about 5 to 10 per possible location transition. This may explain why all the 
training sessions could never reach 100% accuracy and why, e.g., instrument 
performs well even if zero weighted (cf. Table 4 where w, = 1.0) To meet this 
circumstance, heavy dataset balancing [25] is needed. One approach without 
removing samples from the dataset is to use the loss values as an information 
of success and to prioritize samples of higher loss during training, either by 
loss weighting or by dynamic adjustment of selection probabilities. This topic 
also remains work for future investigations. 


5 Conclusions 


We have seen that music as a problem domain provides a rich ground for 
diverse research questions. Datasets with adaptive music from the video game 
Monkey Island 2 can be easily generated using our openly available gener- 
ator. We have also provided experimental setups and analyses covering the 
problem dimensions of neural network architecture and type, context length, 
data input representation, multi-task learning, complexity reduction, multi- 
objective optimization and rare events. These aspects are critically important 
and, in some aspects, unique to the task of adaptive music generation. We have 
presented numerous ideas for future work. Implementing more sophisticated 
multi-objective loss strategies and to cope with rare events by adaptive dataset 
balancing are both worth further investigations. 


The concept of autoregressive music generation also holds the potential to 
be applied in the creation of original compositions for video games featuring 
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adaptive music. Surprisingly, this powerful component [26] is still underuti- 
lized in the industry. The notion of using variational autoencoders controlled by 
a reinforcement learning agent is intriguing. A solution to this challenge surely 
is of interest not only to the video game industry but also to broader contexts, 
as the future trend is clearly heading towards “non-linear media” [27]. 
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Abstract 


Concrete, the second most consumed resource worldwide after water [1], plays 
a fundamental role in construction. However, modeling the production process 
of concrete is challenging due to the incompletely understood physical and 
chemical relationships among its ingredients and various influencing factors, 
and the scarcity of data. Current models predominantly only rely on mix 
design (recipe) data, often overlooking the properties of fresh concrete, the 
interactions that result from curing conditions, and disturbances. This paper 
introduces a holistic view that integrates mix design, fresh concrete properties, 
and curing conditions to enhance predictive models for ultra-high performance 
concrete (UHPC) quality. This analysis highlights the significant effect of 
average power consumption, fresh concrete temperature, and curing storage 
conditions on the quality of concrete. 
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1 Introduction 


Concrete is formulated from cement, aggregates (both fine and coarse), water, 
and occasionally, admixtures. Its production process begins with the com- 
bination of these raw materials, followed by a curing process to ensure the 
end product quality (Figure 1). The curing process typically requires main- 
taining specific moisture and temperature conditions for 28 days. This allows 
the cement to undergo hydration, the pivotal chemical reaction that imparts 
strength to concrete. The process’s intricacy lies in the delicate balance of these 
components and stages, as well as its susceptibility to external environmental 
influences, resulting in potential variances in concrete quality [2]. The basic 
composition of conventional concrete is predominantly characterized by the 
amalgamation of primary constituents: Cement, fine and coarse aggregates, 
and water. However, advancements in concrete technology have underscored 
the integration of supplementary cementitious materials to optimize specific 
mechanical and rheological properties. Materials such as fly ash, silica fume, 
blast furnace slag, and superplasticizers, when judiciously incorporated into 
the mix, can enhance both the compressive strength (CS) and the workability 
of concrete. This yields, e.g., high-performance and ultra-high performance 
concrete (Table 1 [3]). 


Table 1: Differences between conventional (CC), high-performance (HPC), and ultra-high 
performance concrete (UHPC) recipes and properties [3]. CS: Compressive strength. 


Concrete type Cement Water/binder Workability CS 


in kg/m? in % in mm in MPa 

CC 260 — 380 0.45 - 0.65 - 20 - 50 
HPC 400 - 700 < 0.4 455 - 810 50 - 100 

UHPC 800 — 1000 0.2-0.3 260 > 100 


Concrete production presents a multitude of challenges that influence the qual- 
ity and consistency of the end product. The complexity of the process is in- 
fluenced by the intrinsic properties of the raw materials, the mixing conditions 
and tools, the environmental factors, and the storage conditions (Figure 2). 
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Traditional paradigms in concrete production modeling have predominantly 
gravitated towards mix designs, emphasizing input proportions and types [5, 6]. 
The mere act of mixing predetermined quantities does not invariably guarantee 
uniformity or the sought properties in the resultant concrete. Characteristics 
of fresh concrete, such as temperature and workability, are quintessential in 
predicting the final product quality [7]. Adding to this complexity, the curing 
process is inherently dynamic. Adjustments made herein, be it due to exter- 
nal environmental conditions or the targeted properties of the concrete, can 
substantially reshape its micro-structure, and by extension, its macro-behavior. 
Overlooking these complex nuances could culminate in a limited understand- 
ing of the production process, potentially manifesting as inconsistencies, inef- 
ficiencies, or even structural vulnerabilities [4]. 


The primary objective of this contribution, therefore, is to grasp the extent to 
which these multifaceted factors might shape the process and discern strategies 
to modulate or adapt them, ensuring reproducible outcomes. In light of these 
complexities and challenges, a comprehensive framework is proposed in this 
contribution to model the concrete production process. Designed to eclipse 
the constraints of traditional recipe-centric models, this framework assimilates 
insights from fresh concrete characteristics and delves deep into the intricacies 
of the curing process. Our contribution in this work can be summarized as: 


e Determining the important influencing factors on the concrete process. 


e Generating data based on the Taguchi orthogonal array L-50 [8] and the 
characteristics of fresh concrete. 


e Adjusting different curing conditions to analyze their impact. 


e Concrete process modeling based on four different approaches: Mix de- 
sign, fresh concrete, curing conditions, and the entire production process, 
along with analysis of the results. 


In our previous study [5], it was observed that two benchmark datasets, which 
neglected to consider environmental, mix process, and curing conditions in 
their content, exhibited distinctive behaviors when modeled using data-driven 
algorithms. In this paper, our primary focus is to analyze the exact impact of 
these omissions on modeling the concrete production process. Unlike in our 
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(a) Specimens at high temperature 


un Se 


(c) Specimens under water (d) Specimens in plastic foil 


Figure 2: Illustration of different curing conditions in the concrete production process 


previous work where multiple data-driven algorithms were compared, only the 
Gradient Boosting method will be used in this study to discern the effects of 
different modeling approaches. 


2 Traditional Data-driven Concrete Modeling 


Concrete quality estimation models largely fall into traditional models and 
machine learning approaches. The well-known Abram’s law [9] relates the 
water-cement ratio (W /C) to the compressive strength (CS) after 28 days: 


We () 
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where bı and b2 are empirical constants. Enhancing this, Zain et al. [10] 
introduced multiple linear regression, yielding 


w 
CS = bo + by + b2CA +b3FA +C. (2) 


Here, W denotes water volume, C represents cement, CA stands for coarse 
aggregate, and FA signifies fine aggregate. However, both methodologies ne- 
glect the ambient influences, mixing conditions, and the influence of the fresh 
concrete characteristics and curing conditions. 


Modeling the concrete production process using traditional algorithms is chal- 
lenging due to the many partially known effects on CS. Ling et al. [11] found 
that among Support Vector Machine (SVM), Artificial Neural Network (ANN), 
and Decision Tree, SVM was the superior method for studying the impact of 
environmental factors on CS. In contrast, Hoang et al. [12] determined that 
Gaussian Process Regression outperformed both ANN and SVM in estimating 
CS. Ensemble learning regression, however, provided the most accurate results, 
as indicated by [13]. Nevertheless, these studies overlook the properties of 
fresh concrete and curing conditions in their models. 


Ozbay et al. [14] explored the mix proportions of high-strength self-compacting 
concrete using Taguchi’s L-18 experimental design, focusing on six pivotal 
factors to achieve an optimal design. Notably, their work did not consider the 
potential influence of environmental factors and curing conditions on concrete 
production. Safranek [15] delved into the role of the mixing protocol, particu- 
larly examining the effects of mixing speed and time, in concrete production. 
Their findings suggest that UHPC necessitates an extended mixing period com- 
pared to its conventional counterpart to ensure uniformity. However, mixing at 
too high a speed could initiate thermal consequences, which might interfere 
with the chemical processes during blending. Cazacliu et al. [16] embarked on 
an investigation focusing on the importance of power usage patterns during the 
mixing process. 


Assessing the workability of fresh concrete is vital, with the slump flow test 
being a key method [7, 17]. Kemer et al. [18] refined the correlation between 
yield stress and slump results. Hoang and Pham [19] employed LS-SVR for 


220 Proc. 33. Workshop Computational Intelligence, Berlin, 23.-24.11.2023 


slump prediction. Farzampour [20] explored the relationship between environ- 
mental conditions during curing, and the impact of various cement types on 
concrete’s compressive strength. Their findings highlighted that both severe 
weather conditions in the curing process and the water-to-cement ratio can 
significantly affect concrete quality. 


While various subprocesses of concrete production have been investigated, 
modeling the entire process considering all major influences remains unex- 
plored. 


3 Holistic Concrete Production Modeling 


In this contribution, a holistic modeling of the concrete production process is 
presented, integrating aspects of environmental factors, mix design, fresh con- 
crete properties, and curing conditions. The Gradient Boosting (GB) algorithm 
[21, 22], in conjunction with recursive feature elimination (RFE) technique 
[23], is employed for this purpose. The selection of RFE was based on a 
comparative analysis with other standard methods, namely forward feature 
selection and backward feature elimination [24]. Among these techniques, 
RFE demonstrated superior performance, and as such, the outcomes of the 
other methods will not be discussed further. As for the choice of GB, the 
primary focus of this study is not to identify the optimal algorithm for modeling 
the concrete production process but rather to discern the influence of various 
factors on the final product’s quality. Both Random Forest [25] and GB were 
considered in preliminary tests, with GB yielding better results. It’s noteworthy 
that for techniques like RFE, only algorithms capable of inherently determining 
feature importance are viable, further justifying our choice. 


The developed framework operates on a computer powered by an Intel(R) 
Core(TM) i9-10900X CPU with 64 GB RAM. Leave-One-Out Cross Vali- 
dation (LOOCV) [26] learning processes with random initialization are con- 
ducted to validate result consistency. Subsequently, the average performance 
of Gradient Boosting is reported and analyzed, based on the test data garnered 
through the LOOCV process." 
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3.1 Modeling Approaches 


In the context of mixing design, we refer to the specific recipes or raw material 
combination, along with their desired proportions (Figure 3). This modeling 
approach (MA 1) also encompasses the optimal mixing approach, including 
the appropriate speed and duration for mixing. The second modeling ap- 
proach (MA 2) evaluates only the fresh concrete properties. Additionally, 
this modeling approach takes into account the average power consumption 
during the mixing process. The distinction between mixer adjustments (speed 
and duration) and average power consumption stems from two main factors. 
Firstly, mixer adjustments are controllable variables influenced by the type of 
raw materials, concrete type, and the desired attributes of the end product. 
Secondly, once water is introduced to the mixture, the subsequent mixing 
and the corresponding average power consumption after that offer insights 
into the rheological characteristics of fresh concrete. Because of that, mixer 
adjustments are analyzed in the first modeling approach (mix design), and 
average power consumption is examined in the second one. Unlike the second 
modeling approach that considers only the fresh concrete properties, in the 
third approach (MA 3), the effects of fresh concrete properties together with 
curing conditions are investigated (Figure 3). 


During evaluation, the accuracy of the GB algorithm is assessed in the RFE 
process by selecting different numbers of features (3, 4, 5, 6, and 7) for three 
distinct modeling approaches. As fourth modeling approach (MA 4), a holistic 
approach integrates all modeling approaches to model the concrete production 
process (Figure 3). In this comprehensive attitude, the optimal number of 
features (3, 4, 5, 6, and 7) is re-evaluated using RFE to gauge the performance 
of the GB algorithm. The results for each phase are then analyzed to determine 
the most effective combination of factors from both modeling approaches for 
predicting concrete quality after a curing period of 28 days. 


3.2 Recursive Feature Elimination 


Recursive feature elimination is a method designed to address the issue of fea- 
ture selection for machine learning algorithms. By training a model iteratively 
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Modeling approach 2: Fresh Concrete Properties 


Temperature 
Slump Flow Storage conditions 


Air Content 0-24h 
: Power Conductivity Storage conditions: 
Modeling approach 1:: | consumption Funnel Runtime day 2 - 28 


Mix Design 
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Cement reactivity el: Fresh Concrete Properties 
Ingredient temperatures De and Curing Conditions 
Ingredient moistures Duration 
Coarse and Fine agg., 

Superplasticizer, 

Graphite, Water, 

and Binder content 


Curing process 


Modeling approach 4: Holistic Attitude to the Concrete Production Process 


Figure 3: Illustrating the holistic approach from raw material selection to the curing process, 
emphasizing mix design, fresh concrete properties, and curing conditions. 


and eliminating systematically the least important features in each iteration, 
RFE ensures that only the most impactful features are retained. Gradient 
Boosting, by its nature, assigns feature importances based on how often a 
feature is employed to split the data across all trees. This ability makes GB 
an appropriate choice for RFE, as it can objectively rank features and provide 
a clear criterion for elimination. This recursive process continues until the 
desired number of features is retained (Algorithm 1). 


3.3 Gradient Boosting Algorithm 


Gradient Boosting is a machine learning algorithm that aims to construct a 
robust predictive model by iteratively building a series of weak learners. Typi- 
cally, these learners are decision trees. The algorithm iterates by adjusting the 
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Algorithm 1.: Recursive Feature Elimination with Gradient Boosting 


1: Input: Training data X € RY*? with N samples and D features, targets y € RY, 
Gradient Boosting model, desired number of features to select k 

2: Output: Selected feature set S 

3: Train the Gradient Boosting model on all features in X to obtain feature importances 
I 


4 S+-{1,...,D} > Initialize feature set with all features 

5:n<D > Initialize with total number of features 

6: while n > k do 

J: Remove the feature with the lowest importance from § and corresponding entry 
from J. 

8: Retrain the Gradient Boosting model on features in S to obtain updated 
importances I. 

9: n—-n-|1 


10: return Feature set S with the k top important features 


weights of incorrectly predicted instances, ensuring that the following weak 
learner focuses more on these challenging instances. The entire process is 
governed by a predefined loss function, which the algorithm seeks to minimize 
(Algorithm 2). 


4 Experiment Design, Data Collection, and Data 
Preprocessing 


4.1 Controllable Influencing Factors 


In order to achieve uniform quality and reproducibility in concrete production, 
identifying variables that affect consistency is crucial. This includes factors 
like mixing procedures, storage conditions, the presence of admixtures, and 
environmental influences, such as temperature and humidity. Tables 2 and 3 
list the key factors that were chosen from an initial pool of 25 factors. In this 
study, cement is categorized as Cement-reactivity-class = 1 if it had been stored 
for long periods (more than one year), and as Cement-reactivity-class = 2 if it 
had been shortly stored (less than 3 months). Table 3 also detailed two curing 
scenarios: Storage-conditions-1T/C (first day storage conditions after mixing) 
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Algorithm 2.: Gradient Boosting Algorithm 


1: Input: Training data X € RN“? with N samples and D features, targets y € RY, 
Number of boosting rounds M. L(y;, y): Loss function measuring the discrepancy 
between the true target y; and prediction Y. 

2: Output: Final boosted model Fry (x) 

3: Initialize the model with a constant (mean value): 


N 
Fy(x) = argmin Y L(yi, Y) 


Y i=1 
4: for m= 1 to M do 
5: for each data point i do 
6: Compute the negative gradient (pseudo-residuals): 


rm [gezen] 
OF (xi) | F(x) =Fy_1(x) 


T: Fit a weak learner h„,(x) to pseudo-residual using {x;, rim} 
8: Compute multiplier: 
N 
Yn = argmin X L(yi, Fin—1 (xi) + Yhm(xi)) 
Y i=l 


9: Update the model: 
Fin (x) = Fn-1 (x) ate Ynlin (x) 


10: return Fy (x) = F(x) + Ynhm(x) 


and Storage-conditions-28T/C (storage from day 2 to 28). During the first day, 
concrete was stored at 95 % humidity (Storage-conditions-1C = 1) or at 40 
% humidity (Storage-conditions-1C = 2). From days 2 to 28, it was kept at 
40 % humidity (Storage-conditions-28C = 1) or submerged in water (Storage- 
conditions-28C = 2). Given the costly and time-consuming nature of data 
collection in concrete production, 50 experiments were planned. Considering 
the factors detailed in Tables 2 and 3, and the constraints of the maximum 
number of experiments, the Taguchi Orthogonal Array L-50 was employed for 
data generation. The Taguchi Orthogonal Array ensures data robustness and 
an equal distribution of data points [8]. After curing for 28 days, the CS of the 
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Table 2: Factors of mix design: In this investigation, factor values span a range, represented by 
their designated levels (L: Level). For each category, two distinct aggregates are utilized: 
coarse and fine. These aggregates are labeled as type I and II within their categories. 


Factor Abb | Unit} L1 L2 L3 L4 L5 


Cement reactivity class} CRC | - 1 2 - - - 

kg | 3.042 | 2.925 | 3.159 | 3.276 | 3.364 
(%) | (4%) | (0%) | (8 %) (12%) (15 %) 

Ingredient temperature | IT °C 10 20 25 30 40 
Coarse aggregate I | CA-I | kg | 6.900 | 6.000 | 5.400 | 6.300 | 5.100 
Coarse aggregate II |CA-II | kg | 8.925 |10.500|/11.550| 9.975 |12.075 
Fine aggregate I FA-I | kg | 5.100 | 6.000 | 6.600 | 5.700 | 6.900 
Fine aggregate II FA-I | kg | 0.863 | 0.750 | 0.675 | 0.788 | 0.638 
Superplasticizer SP | kg | 0.290 | 0.323 | 0.306 | 0.355 | 0.339 
Graphite GP | kg | 0.045 | 0.000 | 0.090 | 0.135 | 0.225 

Mixing speed MS |rad/s| 200 | 350 | 500 | 350 | 350 

Mixing duration MD s 300 | 300 | 300 | 210 | 480 


Ingredient moisture IM 


specimens was determined using a destructive method. For each experiment, 
six specimens were tested, i.e. a total of 300 specimens were produced. 


4.2 Fresh Concrete Properties 


After each mixing process, the properties of fresh concrete are measured. A 
comprehensive overview of the general characteristics of each property can be 
found in Table 4. Fresh concrete temperature depends on the concrete mix con- 
dition [15], environmental factors, and raw material temperatures. Chemical 
reactions, notably cement hydration, can affect temperature too. High temper- 
atures reduce workability, and low temperatures can extend setting times. 


The air content test gauges the volume of air in fresh concrete as a percentage 
of its total volume, affecting durability and strength. While higher air content 
enhances workability and freeze-thaw resistance, it diminishes compressive 
strength. The average power consumption in concrete production indicates the 
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Table 3: Factors of the Curing Process: In this investigation, curing condition factors vary across 
a range, represented by their designated levels. The numbers before T and C denote the 
curing period in days. (T: Temperature; C: Class; L: Level) 


Factor Abb |Unit| Lı|r2 13 | L4 |L5| 
Storage-conditions-IT | SC-IT | °C | 20 | 20 10 30 | 40 
Storage-conditions-1C | SC-1C - 1 2 2 2 2 
Storage-conditions-28T| SC-28T | °C | 20 | 20 10 30 | 40 
Storage-conditions-28C| SC-28C - 1 2 2 2 2 


mean power utilized for mixing raw materials and overcoming mixture resis- 
tance throughout the entire duration of the process. Environmental factors and 
mixer properties can influence the average power needs. Similarly, chemical 
reactions, notably between water and cement, can modify the average power 
demands. High average power consumption might hint at issues like insuffi- 
cient water, while low average power may suggest a weak mix. Electrical con- 
ductivity in fresh concrete reflects largely the ionic content in the liquid phase. 
This property can indicate the water-to-cement ratio, vital for workability and 
durability. 


The slump flow test evaluates the flowability of fresh concrete, particularly 
for fluid mixes like self-compacting concrete. Concrete is placed in a slump 
cone with an outlet diameter of 120 mm. When the cone is lifted, the concrete 
spreads, and after t = 30, 60, and 120 seconds, the diameter of the spread 
gives the slump flow test value [7]. High values suggest increased flowability, 
which can lead to issues like segregation, while low values might pose place- 
ment challenges. The funnel runtime assesses the flowability of fresh self- 
compacting concrete by timing its flow through a V-shaped funnel. Extended 
funnel times indicate workability concerns, while short times indicate risks like 
segregation or bleeding. 


4.3 Data Preprocessing 


In data preprocessing, steps were taken to ensure data integrity. Manual checks 
are conducted to verify the absence of outliers. The L-50 Taguchi Orthogonal 
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Table 4: Observed Quantities Related to Fresh Concrete 


Factor Abb | Unit | Min | Mean Max STD 
Fresh Concrete Temp. | FCT | °C 17.60 25 31.90 | 3.94 
Air Content AC % 0.40 1.93 7 1.53 
ie pc | kw | 037 | 0.90 | 1.40 | 0.24 
Consumption 
Slump Flow SF | mm 120 | 327.33 | 395 53.36 
Conductivity CD V 4.54 4.62 4.74 0.05 
Funnel Runtime FR s 4 8.05 15 2.69 


Array minimizes collinearity risks, and no issues were found. Six missing 
values in the fresh concrete characteristics led to the exclusion of related ex- 
periments. As a result of excluding the related experiments due to the six 
missing values in the fresh concrete characteristics, the analysis is based on the 
remaining 44 datapoints. In the project, min-max normalization was chosen 
due to the presence of varied scales and feature types, the absence of negative 
values, and the lack of outliers. Additionally, the use of the Taguchi Orthogonal 
Array inherently facilitated the application of min-max normalization to ensure 
consistent interpretation across all factors. If X € RY*P, each entry can be 


denoted as X;;, where i ranges from 1 to N and j ranges from 1 to D: 


yt — Xai minj(X) 3) 


~~ max j(X) — min ;(X) 


4.4 General setting for experiments 


In the 50 experiments, the same mixing tool is employed (Figure 1). Envi- 
ronmental conditions for material storage and production are controlled, mit- 
igating seasonal influences. All experiments utilized a single material batch 
for consistent properties. Both the old and new cement were of the same type 
and originated from the same factory, and production conditions. The mixer 
chamber temperature was measured before each experiment. Given that the 
laboratory’s ambient temperature was consistently maintained at 20 °C, the 
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Figure 4: Comparing the prediction accuracy of Gradient Boosting across different modeling 
approaches (MAs) and also the number of features to be selected by REF. The barplot 
succinctly illustrates the average performance on the test data, derived from 44 LOOCV 
iterations. (MA 1: Mix Design, MA 2: Fresh Concrete Properties, MA 3: Fresh Concrete 
Properties & Curing Conditions, MA 4: Entire Concrete Production Process) 


mixer chamber temperature was also close to 20 °C. As a result, this factor did 
not introduce any variability into the process. 


5 Results and Discussion 


In this study, the prediction accuracy of Gradient Boosting under four modeling 
approaches and the number of features selected (either three, four, five, six, 
or seven) by RFE are analyzed (Figure 4). The objective is to find the com- 
binations of influencing factors from the entire concrete production process 
that would result in optimal model accuracy. When comparing the prediction 
accuracy of the models using the same number of selected features across the 
four modeling approaches (Figure 4), model training on the entire concrete 
production process consistently yielded the lowest MAE for all modeling ap- 
proaches. Specifically, utilizing the complete concrete production process with 
six features yielded the most accurate results, achieving an MAE of 6.65. The 
mix design consistently exhibited the largest error, indicating that this subset 
of data might not be as informative for predictions compared to either the fresh 
concrete and curing conditions data or the comprehensive data from the entire 
process. In summary, adding more features doesn’t always guarantee enhanced 
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performance across all modeling approaches. The data underscores the need 
for careful feature selection. In the evaluation of feature contribution frequency 
across the considered modeling approaches, distinct patterns emerged (Figure 
5). Within the mix design modeling approach, Ingredient-temperature (44 
times) and Mixing-duration (44 times) distinctly stood out, highlighting their 
central role in modeling the recipe. Additionally, Superplasticizer (40 times), 
Mixing-speed (41 times), and Graphite (36 times) are of notable significance, 
reinforcing their essential roles in the mix design modeling approach. Con- 
versely, Cement-reactivity-class (1 time) and Coarse-aggregate-II (2 times) 
showed minimal importance. 


Although the modeling approach was based on fresh concrete data, it isn’t 
elaborated upon in the discussion. This is due to the fact that only 6 fresh 
concrete features exist, which matches the number of inputs selected in the 
considered modeling method. In the fresh concrete properties and curing con- 
ditions modeling approach, average Power-consumption (44 times), Storage- 
conditions-28-T (44 times), and Storage-conditions-1-T (44 times) are consis- 
tently selected, emphasizing their significant roles in the modeling. Further- 
more, Fresh-concrete-temperature (42 times) and Air-content (41 times) made 
significant appearances, underscoring their relevance. In contrast, electrical 
conductivity and Slump-flow are less influential. 


In the entire concrete production process, the terms average Power consump- 
tion, Fresh-concrete-temperature, and Storage-conditions-28T each appeared 
44 times, underscoring their critical roles. Storage-conditions-1T (43 times) 
and Superplasticizer (38 times) also held significant positions. However, fea- 
tures like electrical conductivity (1 time), Funnel-runtime (8 times), Air-content 
(12 times), and Slump-flow (2 times) are less prominent. To culminate, when 
examining combinations for a comprehensive representation of the concrete 
production process, the data from the entire process suggest that average Power- 
consumption, Fresh-concrete-production, Storage-conditions-28T, 
Storage-conditions-1T, Superplasticizer, and Graphite are the most vital. This 
combination promises a comprehensive and accurate modeling of the concrete 
production process. 
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Figure 6: Illustrating the percentage importance of features selected for predicting the compressive 
strength of concrete, as determined by the chosen GB algorithm, using six inputs and 
the entire concrete production process as the modeling approach. The SC-28T feature 
exhibits the highest importance. SC-28T: Storage-conditions-28T, PC: Average Power 
Consumption, SC-1T: Storage-conditions-1T, FCT: Fresh Concrete Temperature, SP: 
Superplasticizer, GP: Graphite 


In Figure 6, a detailed breakdown of the feature importance is presented. This 
breakdown was determined from a model that was identified from a series of 
models trained using various approaches based on LOOCV. Among all these 
modeling approaches, the one that delivered the best accuracy performance 
was selected. Within this chosen approach, several models were generated due 
to the nature of LOOCV. From these models, the one exhibiting a performance 
closest to the average performance over LOOCV was selected. The feature 
importances displayed in Figure 6 are derived from this specific model. The 
chart illustrates that the feature Storage-conditions-28T is of the highest impor- 
tance, contributing 55 % to the decision-making process of the model. This is 
followed by average Power-consumption at 15 %, with the remaining features 
each contributing less than 11 %. In general, that means during the monitoring 
of the concrete production process, from mix design to the final fresh concrete 
state, one can predict the eventual quality of the end product. If this predicted 
quality falls short or is not up to the desired standard, modifications can be 
made to the curing conditions. By implementing these suitable adjustments, it 
becomes feasible to achieve the desired quality for the final product, ensuring 
that the concrete aligns with or surpasses the established benchmarks. 
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6 Conclusion and Future work 


In our previous study [5], it was observed that two benchmark datasets, which 
neglected to consider environmental, mix process, and curing conditions in 
their content, exhibited distinctive behaviors when modeled using data-driven 
algorithms. The presented research underscores the intricacies inherent in the 
concrete production process and the significance of incorporating mix design, 
fresh concrete properties, and curing conditions to enhance predictive models 
for UHPC quality. With this perspective in mind, modifications can be made to 
the curing conditions. By implementing these suitable adjustments, it becomes 
feasible to achieve the desired quality for the final product, ensuring that the 
concrete aligns with or surpasses the established benchmarks. 


This contribution also emphasizes that it is not necessary for modeling to 
measure all factors in the concrete production process. This insight is par- 
ticularly valuable for concrete plants, considering the costs associated with 
sensors and the monitoring process. This investigation identified the crucial 
factors pivotal in enhancing the predictive model’s precision, namely: aver- 
age Power-consumption, Fresh-concrete-temperature, Storage-conditions-28T, 
Storage-conditions-1T, Superplasticizer, and Graphite. However, it’s worth 
noting that this study was conducted under laboratory conditions. In a real 
concrete plant, the situation might differ. For instance, controlling the curing 
process is tough. Wear of the mixing tools and outdoor storage of raw materi- 
als, especially before mixing in harsh weather, can impact product quality. 


For our subsequent steps, we aim to generate more data and delve deeper into 
modeling the concrete production process. 
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1 Einführung 


Bei der Lagerung und Ausstellung von Kulturgütern in historischen Gebäuden 
spielen die Temperatur und relative Luftfeuchtigkeit eine bedeutende Rolle 
für deren langfristigen Erhalt. Damit ein beschleunigter Zerfall der Kultur- 
güter verhindert werden kann, müssen die Raumklimakomponenten gemäß 
dem Fachgebiet der Präventiven Konservierung (PK) innerhalb vorgegebener 
konstanter und dynamischer Grenzwertbereiche liegen. Insbesondere der rela- 
tiven Raumluftfeuchtigkeit und deren Änderungsrate wird eine hohe Relevanz 
beigemessen [1]. Durch die Implementierung eines modellprädiktiven Rege- 
lungsansatzes (MPC, engl.: model predictive control) ist es möglich, die kon- 
servatorischen Anforderungen in Form von Regelgrößenbeschränkungen im 
Regelgesetz zu berücksichtigen. Im Hrabanus-Maurus-Saal (HRMS) innerhalb 
der Bibliothek des Bischöflichen Priesterseminars in Fulda soll ein solcher 
modellprädikitver Regelungsansatz realisiert werden. Dort lagern historisch 
wertvolle Schriften, welche den oben genannten konservatorischen Anforde- 
rungen genügen müssen. Bevor der MPC praktisch in der empfindlichen La- 
gerumgebung umgesetzt werden kann, muss eine simulative Erprobung und 
Evaluierung des Reglers stattfinden, um einen zuverlässigen Regelbetrieb ge- 
währleisten zu können. Hierfür wird ein Simulationsmodell des HRMS benö- 
tigt, welches das hygrothermische Verhalten des Saals mit hoher Genauigkeit 


Proc. 33. Workshop Computational Intelligence, Berlin, 23.-24.11.2023 239 


approximiert. Verschiedene Gebäudesimulationsumgebungen, wie beispiels- 
weise EnergyPlus [2] oder TRNSYS [3] erlauben es, das hygrothermische Ver- 
halten eines Gebäudes mitsamt der darin enthaltenen Räume detailgetreu nach- 
zubilden. Für die Nutzung einer solchen Gebäudesimulationsumgebung sind 
jedoch eine Vielzahl an bauphysikalischen Informationen notwendig. Dazu 
zählen u. A. die exakten Gebäudeabmessungen, der Wandaufbau oder auch die 
verwendeten Baustoffe. Da für den HRMS nur sehr wenige bauphysikalische 
Informationen vorliegen, muss an dieser Stelle ein datengetriebener Model- 
lierungsansatz gewählt werden. Durch langjährige Messungen innerhalb und 
außerhalb des HRMS steht eine große Datenbasis zur Verfügung. Hiermit ist 
eine Identifikation eines Simulationsmodells des HRMS möglich, welches im 
Anschluss für die Reglerevaluierung genutzt werden kann. Der Sachverhalt, 
dass zusätzlich zum internen Prozessmodell des MPC ein weiteres weitaus 
detaillierteres Modell des zu regelnden Prozesses notwendig ist, wird in [4] 
beschrieben. Der vorliegende Beitrag beinhaltet die dynamische Modellierung 
des hygrischen Verhaltens des HRMS. Auf eine hygrothermische Modellierung 
des Raums, welche auch das Wärmeübertragungsverhalten einschließt, wird 
an dieser Stelle verzichtet. In den folgenden Abschnitten werden verschiede- 
ne Modellansätze zur datengetriebenen Systemidentifikation vorgestellt. Nach 
dem Training der Modelle mit Hilfe der Messdaten des HRMS werden diese 
anschließend miteinander verglichen. Der Beitrag schließt mit einer Diskussion 
der Ergebnisse ab. 


2  Datengetriebene Modellbildung 


2.1 Datenbasis 


Die Datenbasis erstreckt sich über einen Zeitraum von 140 Tagen vom 20. De- 
zember 2022 bis zum 8. Mai 2023, wobei die Messdaten mit einer Abtast- 
zeit von 10 Minuten erfasst wurden. Die Daten sind teilweise im geschlosse- 
nen Regelkreis und teilweise im offenen Regelkreis aufgezeichnet worden. Im 
geschlossenen Regelkreis sind Zweipunktregler mit Schwellwerten eingesetzt 
worden, deren Reglerparameter mehrmals geändert worden sind. Zwischen- 
zeitlich ist der Regelkreis geöffnet worden, um Testsignale in Form von APRB- 
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Tabelle 1: Übersicht der gemessenen Größen für die Identifikation des Simulationsmodells 


Formelzeichen Beschreibung 
1 OR Raumtemperatur 
2 U Außentemperatur 
3 Ò, Heizeingriff im Raum 
4 Ò; Globalstrahlung 
5 Mn Be- und Entfeuchtungseingriff im Raum 
6 QA rel. Außenluftfeuchtigkeit 
7 PR rel. Raumluftfeuchtigkeit 


und Chirpsignalen auf den Prozess zu schalten. Die Aufzeichnungen werden 
in Trainings-, Validierungs- und Testdaten aufgeteilt, wobei die Trainingsdaten 
90 % der Gesamtdatenmenge repräsentieren, während jeweils 5% der Daten 
für Validierungs- und Testdaten vorgesehen sind. Die Aufteilung der Daten- 
sätze hängt von vielen Faktoren ab: beispielsweise müssen die jahreszeitli- 
chen Einflüsse auf das Prozessverhalten sowie Abschnitte mit deutlichen Stell- 
größeneingriffen im Trainingsdatensatz enthalten sein, um eine repräsentative 
Datenbasis zu erreichen. Alle drei Teildatensätze stellen jeweils eine aufein- 
anderfolgende Zeitreihe dar, um den temporalen Zusammenhang der Daten 
zu erhalten. Der Testdatensatz beinhaltet den Anfang der Gesamtdatenmenge, 
während der Validierungsdatensatz am Ende des Gesamtdatensatzes verortet 
ist. Die übrigen Daten sind Bestandteil des Trainingsdatensatzes. 

Im Rahmen der Datenerhebung sind die in der Tabelle 1 aufgelisteten Messgrö- 
Ben erfasst worden. Im Zuge der Identifikation werden die Messgrößen eins bis 
sechs als Systemeingänge genutzt, wohingegen die rel. Raumluftfeuchtigkeit 
Or als Systemausgang definiert ist und die zu identifizierende Größe darstellt. 
Dadurch ergibt sich ein Mehrgrößensystem mit mehreren Eingängen und ei- 
nem Ausgang (engl. Multi-Input-Single-Output-System, MISO-System). Für 
das Training der Modelle werden alle Daten auf einen einheitlichen Wertebe- 
reich skaliert. 
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2.2 Takagi-Sugeno-NARX-Modelle 


Die gesammelten Daten stellen allesamt Zeitreihen dar, welche mit Hilfe von 
ARX-Zeitreihenmodellen (engl. autoregressive-model-with-exogenous-input) 
angenähert können. Im Folgenden wird beispielhaft ein linear-affines SISO- 
ARX-Modell veranschaulicht. 


u(k— 1) 


u(k — np) 
Sarx(k) = Ox = (b1,...,bn,,—G1,---;—Gng, 1) y(k-1) (1) 


ens 


Aufgrund der zugrundeliegenden Nichtlinearitäten in Wärme- bzw. Feuchte- 
übertragungsvorgängen eignen sich lineare ARX-Modelle gemäß Gleichung 1 
nur bedingt, wenn mit dem geschätzten ARX-Modell eine Simulation (unend- 
licher Vorhersagehorizont) des Prozesses angestrebt wird. Zur Approximation 
nichtlinearer Prozesse werden deshalb in der einschlägigen Literatur verschie- 
dene Erweiterungen vorgeschlagen, die unter dem Begriff der NARX-Modelle 
(N für engl. nonlinear) eingruppiert werden. NARX-Modelle erweitern die 
ARX-Modellstruktur in Gleichung 1 um eine nichtlineare Funktion f(-), wie 
in der folgenden Gleichung 2 dargestellt wird. 


Snarx(k) = f(Ox) - (2) 


Takagi-Sugeno (TS) Fuzzy-Modelle beschreiben eine Klasse von nichtlinearen 

Modellen mit einem unscharfen (engl. fuzzy) Regelwerk, welche in der Kon- 

klusion eine oder mehrere scharfe Ausgangsgrößen, z. B. über ARX-Modelle, 

bilden. Derartige TS Fuzzy-Modelle werden häufig kurz als TS-NARX-Modelle 
bezeichnet. In diesem Beitrag wird im weiteren Verlauf TSNX als Abkürzung 

genutzt. 
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Zur Schätzung eines passenden TSNX-Modells wurde die LMN-Toolbox [5] 
mitsamt des darin implementierten LOLIMOT-Konstruktionsalgorithmus ver- 


wendet. 
Nm 


Îrsnx $) = Di )Orsnx.X: (3) 
i=l 


Die Fuzzy-Basisfunktionen ®;(z) haben ihren Wertebereich zwischen null und 
eins. Berechnet werden sie aus dem Mittel der Zugehörigkeitsgrade 


= Nm 


E Wi(z) A 


i= 


®;(z) = Wi(z) ' EDO) =1. (4) 
i—1 


Alle u;(z) Zugehörigkeitsfunktionen (ZF) werden als Gaußglocken definiert. 
Mit dem Zentrum v und der Standardabweichung o gilt für eine Gauss’sche 


ZF: : 
UGauss (x) = exp (52) . (5) 


Die Schedulingvariable z ist im vorliegenden Fall ein Vektor, der die Außen- 
temperatur, die Innentemperatur, die relative Luftfeuchtigkeit im Außenbereich 
und die relative Luftfeuchtigkeit im Innenraum enthält. Diese Größen werden 
allesamt aus dem letzten Abtastschritt herangezogen. 


z= [Pr(k- 1), Or(k-1), Palk- 1), Balk-1)]" (6) 


Die Festlegung der Hyperparameter wird durch eine systematische Suche rea- 
lisiert. Die Anzahl ng und ng 1,...,np 6 der zurückliegenden Regressoren wird 
variiert, sodass vergangene Einflussgrößen aus einer, zwei, vier, acht bzw. zwölf 
Stunden in die Modelle einfließen. Bei einer 10-miniitigen Abtastung des Da- 
tensatzes entspricht dies 6, 12, 24, 48 und 72 Abtastschritte je Einflussgrö- 
ße. Als Abbruchkriterium für den LOLIMOT-Algorithmus wird die maximale 
Teilmodellanzahl ausgewählt und im Bereich von 20, 40, 80 und 160 variiert. 


Das resultierende TS-Modell besteht aus nm = 16 Teilmodellen mit insgesamt 
Npar = 8080 Parametern und greift auf die Einflussgrößen der letzten zwölf 
Stunden zurück. Im Zuge des Vergleichs mit den anderen Modellen wird das 
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TSNX-Modell simuliert, wobei der Modellausgang zurückgeführt wird und 
eine Output-Error-Struktur (OE) entsteht. 


2.3 Local Model State Space Networks 


Local Model State Space Networks (LMSSN) repräsentieren eine neue Klasse 
nichtlinearer Zustandsraummodelle, deren Struktur auf lokalen Modellnetzen 
(LMN, engl. local model networks) aufbaut [6]. Zur Approximation der Nicht- 
linearitäten eines Prozesses wird eine ähnliche Herangehensweise basierend 
auf einem Regelwerk wie bei den zuvor besprochenen TSNX-Modellen ge- 
wählt: mehrere Teilmodelle, welche für unterschiedliche Arbeitspunkte, ex- 
terne Planungsparameter usw. gültig sind, Können einen nichtlinearen Prozess 
besser abbilden, als ein vergleichbares lineares Zustandsraummodell. Die Ver- 
knüpfung dieser Teilmodelle zu einem nichtlinearen Gesamtmodell ermöglicht 
hierdurch große Gültigkeitsbereiche. Die Besonderheit bei LMSSN ist, dass 
die nichtlinearen Funktionen f(-) und g(-) in 


x(k+1) = f(x(k),u(k)) (7) 
y(k) = g(x(k),u(k)) (8) 


auf unterschiedliche Art und Weise angenähert werden können. Einerseits be- 
steht die Möglichkeit jede i-te Zeile der Zustandsgleichung (7) als einzelnes 
LMN zu betrachten und anschließend für jedes einzelne LMN einen Konstruk- 
tionsalgorithmus wie LOLIMOT oder HILOMOT separat zu nutzen. Ande- 
rerseits kann die gesamte Zustandsgleichung in einem lokalen Modellnetz in 
Form eines MIMO-LMSSN berücksichtigt und anschließend mit den genann- 
ten Algorithmen konstruiert werden. Gleiches gilt für die Ausgangsgleichung 
der Zustandsraumdarstellung (8). Die LMSSN-Toolbox bietet dem Nutzer zu- 
dem die Möglichkeit eine individuelle Auswahl der mit nichtlinearen Phä- 
nomenen behafteten Eingangsgrößen und Zustandsgrößen. Somit kann vom 
Nutzer gezielt apriorisches Wissen in den Algorithmus eingebracht und mögli- 
cherweise der Rechenaufwand verringert werden. Das in diesem Beitrag vorge- 
stellte LMSSN zur Modellierung der Feuchteübertragungsmechanismen stellt 
ein nichtlineares, zeitdiskretes Zustandsraummodell zweiter Ordnung (nx = 2) 
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mit np = 6 Eingängen und einem Ausgang nq = | dar. 


8(k+1) = È o; j +a? ;£(k) + BF u(k)] BFLk) , (9) 
Sunssn(k) = Y [Pm + c72) + dT ulk)] ©). (10) 
m=1 


Jede Zeile in dieser Zustandsraumdarstellung ist als eigenständiges LMN defi- 
niert, sodass das LMSSN insgesamt aus drei einzelnen LMN besteht. Die erste 
Zustandsgleichung bzw. das erste LMN wird vom LOLIMOT-Algorithmus in 
zwei Teilmodelle aufgeteilt: nm, = 2. Die zweite Zustandsgleichung bzw. das 
zweite LMN besteht nur aus einem Teilmodell, n,,, = 1. Die Ausgangsglei- 
chung (drittes LMN) wird mit zwei Teilmodellen abgebildet, nọ = 2. Obwohl 
in der LMSSN-Toolbox alle Einflussgrößen als Kandidaten zur weiteren Auf- 
teilung deklariert werden, hat der Algorithmus für das erste und das dritte LMN 
nur bezüglich der ersten beiden Zustände aufgeteilt. Das resultierende LMSSN 
beinhaltet insgesamt 45 Parameter. 


2.4 NARX-Netze 


NARX-Netze (NXN) basieren auf Gleichung 2. Die nichtlineare Funktion f(-) 
wird dabei mit Hilfe eines künstlichen neuronalen Netzes (KNN) angenähert. 
Bei den Neuronen der Zwischenschicht des KNN findet eine Zündung mit der 
ftanh() - Aktivierungsfunktion statt. Die Parametermatrix Oxyxn gewichtet die 
Regressoren x. W out sorgt für eine weitere Gewichtung und stellt die eigentli- 
chen Gewichte des KNN dar. Zusätlich dazu wird ein Offsetvektor B = addiert. 
Dadurch ergibt sich für die erste Zwischenschicht des NARX-Netzes folgende 
Gleichung: 


Snan (k) = Woufiann (Orn + Bu) : (11) 


Im Verlauf des Trainingsprozesses werden neben der Anzahl der Regressoren, 
beispielsweise auch die Anzahl der Neuronen variiert. Das daraus folgende 
Modell hat eine Zwischenschicht mit 10 Neuronen und die Regressoranzahl 
sowohl fiir die Eingänge als auch den Ausgang liegt bei 24. Bei einer Abtast- 
zeit von 10 Minuten entspricht dies einem Zeitraum von 4 Stunden. Fiir das 
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Training wird der Levenberg-Marquardt-Algorithmus genutzt. Bei genannter 
Konfiguration liegt eine Parameteranzahl von npar = 1701 vor. Für den späte- 
ren Vergleich liegt das Modell im Gegensatz zum Trainingsprozess nicht mehr 
in der vorgestellten NARX-Modellstruktur vor. Dies liegt daran, dass für die 
Simulation der Modellausgang zurückgekoppelt werden muss. Dadurch ergibt 
sich eine OE-Struktur. Die Implementierung und das Training der NARX- 
Netze findet mit der Deep Learning Toolbox von MATLAB statt. 


2.5 Neural State Space Models 


Die grundlegende Formulierung von Neural State Space Models (NSSM) ist in 
den Gleichungen (12) und (13) zu erkennen. Hierbei sind fnss und Znss KNNs 
und x(k) ist der Zustand des Systems zum Zeitpunkt k. Die Eingangsgrößen des 
Systems zum Zeitpunkt k werden durch u(k) repräsentiert. Bei p handelt es sich 
um die Gewichte des KNN, die im Laufe des Trainingsprozesses identifiziert 
werden miissen. 


x(k+1) = fass (x(k),u(k),p) (12) 


y(k) = Snss (x(k), p) (13) 


Bei einer beispielhaften Betrachtung der ersten Zwischenschicht eines NSSM 
ergeben sich die Gleichungen 14 und 15. Dabei stellen V4, Vp und Vç die 
Gewichtung der Zustände und Eingänge dar. Zusätzlich dazu und ähnlich wie 
in Gleichung 11 gibt es die zusätzlichen Gewichtungen W ‚g und Wç. Weitere 
Parameter, die während des Trainingsprozesses identifiziert werden müssen, 
sind die zusätzlichen Offsetvektoren Bag sowie Bc [7]. 


8(k-+1) = Wap fann (Vak) +Vpulk) +B ap) oe 


Snssm(k) = Wo fann (Voth) + Bo) (15) 


Im Rahmen einer Hyperparameterstudie des NSSM werden beispielsweise die 
Anzahl der Zwischenschichten, die Anzahl der Neuronen pro Schicht oder 
auch die Größe der Batches variiert. Auch die Anzahl der Zustände wird im 
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Laufe der Studie mitberücksichtigt. Das aus der Hyperparameterstudie resultie- 
rende Modell erster Ordnung besitzt 2 Zwischenschichten mit je 16 Neuronen 
und wird während des Trainings mit einer Batchgröße von 1024 Datenpunkten 
gespeist. Für das Training wird der Adam-Optimierer genutzt. Das resultie- 
rende Modell weist zudem eine Parameteranzahl npar = 417 auf, was den 
Gewichten im Zustandsnetzwerk entspricht. Auf das Training eines Ausgangs- 
netzwerkes wird an dieser Stelle verzichtet, wodurch gnss der Einheitsmatrix I 
entspricht. Die Umsetzung und Implementierung der NSSM erfolgt mit Hilfe 
der Deep Learning Toolbox und der System Identification Toolbox von MAT- 
LAB. 


3 Vergleich der Ergebnisse 


Die vorgestellten Modelle werden auf den Testdaten des HRMS evaluiert und 
anschließend miteinander verglichen. Diese erstrecken sich über einen Zeit- 
raum von 9 Tagen im Dezember 2022. Für den Vergleich der Modelle werden 
unterschiedliche Fehler- und Gütemaße herangezogen. Neben dem root mean 
squared error (RMSE) und dem sum of squared errors (SSE) wird auch der 
variance accounting for (VAF) für die Beurteilung des Modells genutzt. Bei 
dem RMSE und dem SSE handelt es sich um Fehlermaße. Im Gegensatz dazu 
ist der VAF ein Gütemaß. Mit dem RMSE und dem SSE kann grundlegend 
die Abweichung zwischen Messdaten und Modellausgang evaluiert werden. 
Mit dem VAF kann untersucht werden, mit welcher Genauigkeit das Modell 
vor Allem die Dynamik der Messdaten abbildet, weil Offsetfehler keinerlei 
Einfluss auf das VAF-Gütemaß haben. Für eine detaillierte Erläuterung der 
Fehler- und Gütemaße wird auf [8] verwiesen. In Tabelle 2 sind die Fehler- und 
Gütemaße für alle Modelle aufgelistet. Außerdem zeigt Tabelle 2 die Anzahl 
der Modellparameter zur Beurteilung der Modellkomplexität. Es ist ersicht- 
lich, dass sowohl der RMSE als auch der SSE für das NXN-Modell, LMSSN- 
Modell und TSNX-Modell nahe beieinander liegen. Im Gegensatz dazu kann 
das NSSM-Modell den geringsten RMSE und SSE aufweisen. Das TSNX- 
Modell besitzt den höchsten VAF und bildet die Dynamik der Daten mit hoher 
Genauigkeit ab. Eine graphische Gegenüberstellung der gemessenen Daten und 
der Ausgänge der unterschiedlichen Modellstrukturen auf den Testdaten ist in 
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Tabelle 2: Übersicht der Fehler- und Gütemaße sowie die Anzahl der Parameter aller Modelle 


RMSE [%rH] SSE [%rH?] VAF[%] npar 
NSSM 0.68 585.5 87.07 417 


NXN 1.05 1370.1 82.57 1701 
LMSSN 1.02 1318.8 83.21 45 


TSNX 1.06 1413.1 94.65 8080 


Bild 1 zu finden. Hier sind neben den Modellausgängen und den dazugehöri- 
gen Fehlerzeitreihen, auch die skalierten Eingänge des Modells sowie jeweils 
ein Geigenplot aller Modelle zur Beurteilung der Fehlerverteilung erkennbar. 
Der im Geigenplot mit einer etwas dunkleren Farbgebung dargestellte Bereich 
repräsentiert den Interquartilsabstand, welcher 50 % aller Daten beinhaltet. 
Grundsätzlich weisen alle Modelle gute Approximationseigenschaften auf und 
können die Messdaten unter Berücksichtigung der Dynamik annähern. Bei 
Betrachtung des zeitlichen Verlaufs des Fehlers oder auch des Geigenplots fällt 


auf, dass sich der Fehler ungefähr im Bereich + 2 %rH aufhält, was den Tole- 
ranzen typischer Feuchtesensoren entspricht. Das NSSM-Modell weist die ge- 
ringste Abweichung auf. Dies ist nicht nur im Fehlerverlauf erkennbar, sondern 
auch im Geigenplot des NSSM-Modell. Hier liegt der Median (weißer Punkt 
im Geigenplot) der Fehlerverteilung bei annähernd null. Bei Betrachtung des 
NXN-Modells und LMSSN-Modells ist eine Verschiebung der Fehlervertei- 
lung in den positiven Wertebereich ersichtlich. Dies bedeutet, dass die Modelle 
die Testdaten zu einem Großteil der Zeit unterschätzen. Das TSNX-Modell 
bringt eine noch größere Abweichung mit sich, ist jedoch mit der kleinsten 
Varianz gekennzeichnet. Das TSNX-Modell weist zudem die höchste Modell- 
komplexität mit npar = 8080 Parametern auf. Dazwischen liegen das NXN- 
Modell mit 1701 Parametern und das NSSM-Modell mit einer Parameteranzahl 
von 417. Die geringste Modellkomplexität besitzt das LMSSN-Modell mit 45 
zu schätzenden Parametern. 
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Bild 1: Überblick über die gemessenen Feuchtedaten, die Modellausgänge, die Fehlerzeitreihen, 
die skalierten Eingangsgrößen der Modellstrukturen, wobei für m, ein Wert > 0.5 
eine Befeuchtung und ein Wert kleiner < 0.5 eine Entfeuchtung darstellt, sowie einen 
Geigenplot zur Beurteilung der Fehlerverteilung zwischen gemessenen Daten und Modell 
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4 Diskussion und Ausblick 


Die Erhebung von aussagekräftigen Messdaten gestaltet sich aufgrund der Not- 
wendigkeit der Einhaltung der konservatorischen Anforderungen an die Kul- 
turgüter schwierig: schnelle Änderungen der Raumklimaparameter sind zu ver- 
meiden, sodass die Anwendung von Testsignalen im Prozess mit äußerster 
Vorsicht und unter Berücksichtigung weiterer Kriterien, wie z. B. den Au- 
ßenbedingungen, vorzunehmen sind. Unter Umständen können so wichtige 
Arbeitspunkte bzw. Arbeitsbereiche des Prozesses nur unzureichend abgedeckt 
werden. Hinzu kommt, dass ein Teil der Messdaten im geschlossenen Regel- 
kreis aufgezeichnet worden sind, wodurch eine Korrelation der Messgrößen 
zustande kommt, welche sich negativ auf eine erfolgreiche Systemidentifikati- 
on auswirken können. Der interessierte Leser wird auf die Identifikationsbedin- 
gungen von [9] verwiesen. Im Laufe des Beitrags wird deutlich, dass die un- 
tersuchten Modellstrukturen trotz der genannten Nachteile im Messdatensatz 
grundsätzlich gute Approximationsgüten aufweisen. Jedes Modell hat unter- 
schiedliche Vorteile und Nachteile, wobei drei von vier Modellen einen Offset- 
fehler aufzeigen. Dieser Fehler resultiert unter Umständen aufgrund der Kor- 
relation zwischen Eingangs- und Ausgangsgrößen, wodurch die verschiedenen 
Schätzmethoden keine konsistenten Ergebnisse liefern. Das TSNX-Modell be- 
sticht mit einer sehr geringen Varianz in der Fehlerverteilung, bringt aber mit 
8080 Parametern die größte Modellkomplexität mit sich. Im Gegensatz dazu, 
hat das LMSSN-Modell die geringste Modellkomplexität aller Modelle, zeigt 
jedoch die ausgeprägteste Fehlerstreuung unter den untersuchten Modellen. 
Ein weiterer Nachteil bei dem LMSSN-Modell ist die lange Trainingsdauer (48 
Stunden) im Vergleich zu den anderen Modellen. Mit Blick auf die Nutzung 
als Simulationsmodell für die Evaluierung modellprädiktiver Regelungsansät- 
ze zeigt das NSSM-Modell Potential, da es nicht nur die geringste Abweichung 
zu den Messdaten zeigt, sondern auch in Kombination mit der Model Predictive 
Control Toolbox von MATLAB ohne große Portierungsmaßnahmen genutzt 
werden kann. In künftigen Arbeiten stehen weitere datengetriebene Methoden, 
wie z. B. physikgeführte neuronale Netze im Fokus. 

Wir möchten dem Team von Prof. Dr.-Ing. Oliver Nelles für die Bereitstellung 
der LMSSN-Toolbox sehr herzlich danken. 
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1 Einführung 


In den vergangenen Jahren wurden vor allem durch die Entwicklung leistungs- 
fähiger datengetriebener Methoden erhebliche Fortschritte im Bereich des au- 
tomatisierten Fahrens erzielt. Ein zentraler Erfolgsfaktor ist die Verfügbar- 
keit von großen Datensätzen, die das Training und das Testen von datenba- 
sierten Modellen in realistischen Szenarien ermöglichen. Aktuell verfügbare 
Datensätze fokussieren sich dabei überwiegend auf Autobahnen und urbane 
Gebiete mit einer gut ausgebauten Infrastruktur. Ländliche Regionen wurden 
bisher hingegen nur wenig berücksichtigt, obwohl sie einige besondere Heraus- 
forderungen aufweisen: Spärlich ausgebaute Infrastruktur, teilweise fehlende 
bzw. schlecht sichtbare Fahrbahnmarkierungen, ungeräumte Fahrbahnen bei 
Schnee, morgendlicher Nebel, Wildwechsel oder wechselnde Lichtverhältnisse 
bei Fahrten durch Waldgebiete stellen unter anderem aufgrund der Domain- 
Gaps zu bestehenden Datensätzen mit definierten Operational Design Domains 
(ODD) besondere Herausforderungen an die Sensoren und Modelle. 


Daten aus der ländlichen Domäne sind für die Entwicklung und Evaluierung 
von domänenübergreifenden lernbasierten Algorithmen von großer Bedeutung. 
Sun et al. konstatieren bei der Evaluierung von Objektdetektionsalgorithmen 
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auf dem Waymo Open Dataset [1] einen deutlichen Leistungsabfall, falls die 
Modelle auf einem Datensatz aus einer städtischen bzw. einer vorstädtischen 
Region trainiert werden und auf dem jeweils anderen Datensatz validiert wer- 
den [1]. Lang et al untersuchen dabei die 3D Detektionen von Fußgängern und 
Fahrzeugen mit dem lidar-basierten Pointpillars Modell [2]. Diese Erkenntnis 
unterstreicht die Bedeutung domänenübergreifender Datensätze und insbeson- 
dere den Bedarf an aufgezeichneten Fahrten im ländlichen Raum. 


Der vorliegende Beitrag stellt die Vorgehensweise und Herausforderungen bei 
der Aufnahme des neuen DEMANDAR Datensatzes, den Aufbau des verwen- 
deten Messfahrzeugs (siehe Bild 1), beispielhafte Daten sowie die Maßnahmen 
zur Annotierung der Daten vor. Um verschiedenste Witterungsverhältnisse ab- 
zubilden, werden über einen Zeitraum von einem kompletten Jahr Daten auf 
einer definierten Messstrecke in der Region Südwestfalen erhoben. Der Daten- 
satz bildet somit die gleiche Strecke zu verschiedenen Tages- und Jahreszeiten 
bei unterschiedlichen Witterungsverhältnissen ab. 


Nach bestem Wissen der Autoren ist bisher kein multimodaler und multisai- 
sonaler Datensatz öffentlich verfügbar, der umfangreiche Fahrten aus ländli- 
chen Regionen enthält. Des Weiteren wurden bisher nur wenige Datensätze 
publiziert, die neben Lidar- und Kameradaten auch Daten von Automotive- 
Radarsensoren enthalten. Ein weiteres Alleinstellungsmerkmal des Datensat- 
zes ist die zentimetergenaue Referenzlokalisierung durch ein kinematisches 
Echtzeit-Inertial-Navigationssystem (RTK-DGNSS/INS). 


2 Stand der Technik 


Es gibt bereits einige Datensätze für verschiedene Aufgaben im Bereich des 
automatisiertes Fahren, wie zum Beispiel die Objektdetektion oder Lokalisie- 
rung. Oftmals umfassen Datensätze eigene Benchmarks zum Vergleich ver- 
schiedener Ansätze. In diesem Abschnitt werden einige in der Forschung zum 
automatisierten Fahren häufig genutzte Datensätze vorgestellt. 


Der KITTI-Datensatz [3] war einer der ersten öffentlich verfügbaren Daten- 
sätze. Es werden Lidar-, Kamera- und RTK-GPS/IMU-Daten synchronisiert 
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bereitgestellt. Der Datensatz enthält allerdings keine Radardaten. Der KITTI- 
Datensatz enthält verschiedene Szenarien, die auf zumeist städtischen Stra- 
Ben in Karlsruhe aufgenommen wurden. Im Laufe der Zeit wurde der KITTI- 
Datensatz mit umfassenden Annotationen und Benchmarks für verschiedene 
Anwendungen erweitert (z.B. Straßenerkennung [4] oder semantische Punkt- 
wolkenannotationen [5]). 


Das Waymo Open Dataset [1] wurde mit einer Flotte selbstfahrender Fahrzeuge 
aufgezeichnet, die jeweils mit mehreren Lidarsensoren, Kameras und einem 
RTK-GPS/IMU-System ausgestattet waren. Radarsensoren wurden ebenfalls 
nicht zur Umfelddetektion verwendet. Der Datensatz wurde in verschiedenen 
städtischen und vorstädtischen Regionen in San Francisco, Phoenix und Moun- 
tain View, USA sowohl bei Tag als auch bei Nacht aufgezeichnet. Damit ist der 
Waymo-Datensatz einer der umfangreichsten, öffentlich verfügbaren Datensät- 
ze für das automatisierte Fahren. 


Der nuScenes-Datensatz [6] stellt als einer der ersten großen Datensätze für 
automatisiertes Fahren neben Lidar-, Kamera- und GNSS/IMU-Daten auch die 
Daten von Automotive-Radarsensoren bereit. nuScenes liefert hochauflösende 
Daten für über 1.000 Szenarien. Unterschiedliche Wetter- und Lichtbedingun- 
gen sind ebenfalls enthalten. Der Datensatz deckt ca. 1.000 km von haupt- 
sächlich städtischen Straßennetzen in Boston und Singapur ab. Der nuScenes- 
Datensatz wurde später um punktuelle semantische Lidar-Annotationen und 
einen panoptischen Benchmark erweitert [7]. 


Im Oxford RobotCar-Datensatz [8] sind Lidar- sowie GPS/IMU-Daten und 
Stereokamerabilder enthalten. Die Sensorik wurde später um einen rotierenden 
360°-NavTech-Radar ergänzt [10]. Die RobotCar-Datensätze wurden durch 
wiederholte Fahrten auf der gleichen städtischen Route in Oxford, Großbritan- 
nien aufgezeichnet. Zu unterschiedlichen Tageszeiten wurden die Messfahrten 
für den ursprünglichen Datensatz über ein Jahr und für die Radarerweiterung 
über einen Monat durchgeführt. Nach bestem Wissen der Autoren sind der 
nuScenes- und der Oxford Radar RobotCar-Datensatz die bisher einzigen ver- 
öffentlichten Datensätze, die das komplette 360°-Sichtfeld um das Fahrzeug 
mit Lidar-, Radar- und Kameramodalitäten abdecken. 
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Der Boreas-Datensatz [9] wurde ebenfalls durch das wiederholte Befahren 
einer städtischen Route über ein ganzes Jahr hinweg in Toronto, Kanada auf- 
gezeichnet. Der Datensatz umfasst verschiedene herausfordernde Wetter- und 
Lichtbedingungen (z. B. Regen, Schnee, Nacht usw.). Die Aufnahmeplattform 
verfügt über einen rotierenden 360°-Lidar auf dem Dach, eine nach vorne ge- 
richtete Kamera und einen GNSS/IMU-Sensor, der eine Lokalisierungsgenau- 
igkeit von bis zu 2-4 cm erreicht. Ähnlich wie beim Oxford Radar RobotCar- 
Datensatz [10] wird auch hier ein rotierender 360° NavTech Radar verwendet. 
Die Doppler-Messungen des Radars sind als relevante Zustandsgröße jedoch 
nicht enthalten. Somit wird in beiden Datensätzen einer der wesentlichen Vor- 
teile von Radar-Sensoren gegenüber Lidar-Sensoren nicht genutzt [11]. 


Der View-of-Delft-Datensatz [12] umfasst Daten von einem rotierenden 360°- 
Lidar, einer Stereokamera und einem RTK-GPS/IMU Navigationssystem. Zu- 
sätzlich wurde ein nach vorne gerichteter 3+1D Automotive Radarsensor ver- 
wendet. Neben Entfernungs-, Azimuth- und Doppler-Messungen liefert dieser 
Sensor auch eine Messung in der Elevation. Die Daten wurden auf den städti- 
schen Straßen von Delft in den Niederlanden aufgezeichnet. 


Tabelle 1 zeigt einen kurzen Überblick über die erwähnten Datensätze. 


Tabelle 1: Überblick der Datensätze im Bereich des automatisierten Fahrens 


Datensatz Kamera Lidar Radar GNSS Ort 
KITTI [3] v v x v Karlsruhe 
Waymo [1] v y X v Phoenix, San Francisco 
nuScenes [6] v v v v Boston, Singapur 
Radar RobotCar [10] v v v v Oxford 
Boreas [9] v v v v Toronto 
View-of-Delft [12] v v v v Delft 
Demandar v v v v Südwestfalen 


3  Messfahrzeug 


Als Messfahrzeug diente ein Nissan Leaf ZEO mit einer umfangreichen Sen- 
sorik (siehe Bild 1). Die folgenden Abschnitten beschreiben den technischen 
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Aufbau sowie die verwendeten Koordinatensysteme und die Vorgehensweise 
bei der Kalibrierung der Sensoren. 


— / 


| Automated Driving P 
esearch 


Bild 1: TU Dortmund Forschungsfahrzeug 


3.1 Technischer Aufbau des Messfahrzeugs 


Das Messfahrzeug verfügt über einen Mid-Range- (Ouster OS1), einen Long- 
Range-Lidar (Ouster OS2) und sechs RGB-Kameras (FLIR Chameleon 3), 
die auf einem Dachgepäckträger montiert sind. Hinter der Windschutzscheibe 
befindet sich eine Serien-Frontkamera von Mobileye. An den Ecken des Fahr- 
zeugs sind vier prototypische 77 GHz Mid-Range Automotive Radar-Sensoren 
hinter den Stoßfängern verbaut. Ein RTK-DGNSS/INS (GeneSys ADMA-G 
Eco+) ermöglicht eine zentimetergenaue Referenzlokalisierung. 


Bild 2 zeigt einen Überblick über den technischen Aufbau des Versuchsfahr- 
zeugs. Die Daten der verschiedenen Sensoren werden auf einem Zentralrechner 
mit AMD Ryzen 7 7700X Prozessor, 64 GB RAM, 8 TB SSD Speicher, einer 
Nvidia GTX 1050 Ti GPU und einem 24 V Netzteil zusammengeführt. Der 
Rechner verfügt über zwei PCI CAN FD Interface Karten von PEAK Systems, 
über die mithilfe der SocketCAN Treiber mit verschiedenen CAN Geräten 
kommuniziert werden kann. Über diese Schnittstelle werden unter anderem 
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die Daten vom fahrzeugeigenen CAN-Bus empfangen, dekodiert und geloggt. 
Einige Signale vom Fahrzeug-CAN-Bus, wie beispielsweise die Fahrzeuge- 
schwindigkeit, dienen wiederum als Eingangssignale für die Radarsensoren 
und für die Mobileye Frontkamera. Die weiteren sechs Kameras sind über USB 
3.1 mit dem Zentralrechner verbunden. 


Serien- 
Frontkamera 


RTK-GNSS/INS GNSS- 


——— (7) Antenne 
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Mobilfunk- 


modem 
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Bild 2: Technischer Aufbau des Messfahrzeugs 


Die beiden Lidar-Sensoren und das RTK-DGNSS/INS sind über einen 10 Gi- 
gabit Ethernet Switch mit dem Zentralrechner verbunden. Neben den Sensoren 
sind zudem acht Nvidia Jetson AGX Xavier Developer Kits für besonders 
rechenintensive Anwendungen im Kofferraum des Forschungsfahrzeugs ver- 
baut. 


Beide Lidare verfügen über eine Auflösung von 1.024x64 Punkten bei 20 Hz. 
Der Ouster OS2 Long-Range Lidar hat eine Reichweite von bis zu 210 m bei 
einem Sichtfeld von 360° x22.5°. Pro Sekunde werden somit 1.310.720 Punkte 
gemessen. Der Ouster OS1 Mid-Range Lidar hat hingegen eine etwas geringere 
Reichweite von bis zu 170 m bei einem etwas größeren Öffnungswinkel von 
360°x45°. Der Mid-Range Lidar verfügt zudem über eine Dual Return Funk- 
tionalität, bei dem nicht nur der erste, sondern auch der zweite Reflex eines 
Laserpulses gemessen wird. Dadurch können auch Objekte erfasst werden, die 
sich hinter anderen Objekten (z.B. Glasscheiben oder Nebel) befinden. Der 
Mid-Range Lidar kann somit 2.621.440 Messpunkte pro Sekunde erfassen. 
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Neben einer Punktwolke liefert der Mid-Range Lidar zudem eine Intensitäts- 
messung, welche die Reflektivität der Objekte erfasst. Beide Lidare verfügen 
zudem über eine integrierte IMU mit 6 Freiheitsgraden. 


Bei den Radarsensoren handelt es sich um prototypische Frequency Modulated 
Continuous Wave (FMCW) Automotive Sensoren. Die Sensoren verwenden 
ein Frequenzspektrum im 77 GHz Band und besitzen ebenfalls eine Aufnah- 
merate von 20 Hz. Aufgezeichnet werden sowohl die von den Sensoren detek- 
tierten Objekte als auch die Radar-Targets. 


Die FLIR Chameleon3 RGB Kameras verwenden einen Global Shutter, um un- 
erwünschte Bildartefakte bei bewegten Objekten zu vermeiden. Die Auflösung 
der Kameras beträgt 2.048 x 1.152 Pixel (ca. 3,2 Megapixel) bei 20 Hz. Um die 
Datenrate möglichst gering zu halten, werden Kamerabilder nicht als Rohbilder 
sondern als komprimierte Einzelbilder aufgezeichnet. Die Mobileye Frontka- 
mera verfügt über eine integrierte Objekterkennung sowie -klassifizierung und 
liefert statt einem Kamerabild direkt eine Objektliste mit detektierten Hinder- 
nissen und Verkehrszeichen. 


Das RTK-DGNSS/INS besteht aus einer GNSS Antenne auf dem Dach des 
Fahrzeugs, einem LTE Mobilfunkmodem, über das NTRIP Korrekturdaten von 
Bodenstationen empfangen werden können, und der Automotive Dynamic Mo- 
tion Analyzer (ADMA) Einheit, die in einer IMU mit 9 Freiheitsgraden hoch- 
genaue Beschleunigungssensoren und Gyroskope (Faserkreisel) enthält. Die 
Genauigkeit der Positionsbestimmung liegt bei 0,01 m, die Genauigkeit des 
erfassten Roll- und Nick-Winkels liegt bei 0,01° und des Gier-Winkels bei 
0,025°. Im Falle einer Verschlechterung oder eines Ausfalls des GNSS Emp- 
fangs, beispielsweise in Tunneln oder in bewaldeten Gebieten, wird die Positi- 
on durch diese Sensoren für einen gewissen Zeitraum sehr genau geschätzt und 
bereitgestellt. 


Alle umfelderfassenden Sensormodalitäten (Lidar, Radar und Kamera) decken 
die komplette 360° Rundumsicht ab. Durch die vergleichsweise hohen Auf- 
nahmeraten und die Vielzahl von Sensoren ergibt sich eine entsprechend hohe 
Datenrate von insgesamt ca. 350 MB/Sekunde. Die Aufnahme einer einminü- 
tigen Fahrt erzeugt somit bereits eine Datenmenge von 20 GB. 
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Bild 3: Beispielhafte Darstellung der Sensordaten. 


Beispielhaft dargestellt sind die Sensordaten in Bild 3. Die Lidar-Punktwolke 
ist rötlich dargestellt. Die Radar-Targets sind als bläuliche Punktwolken ab- 
gebildet. Die dargestellten Bounding Boxen veranschaulichen die durch die 
Radarsensoren detektierten Objekte, die dazugehörigen Geschwindigkeitsvek- 
toren (rote Linie an der Spitze der Bounding Boxen) und die Klassifizierung der 
Objekte (z.B. entsprechen blaue Bounding Boxen Fahrzeugen, grüne Boxen 
Fahrrädern). Das statische Orthofoto im Hintergrund zeigt die zentimeterge- 
naue Lokalisierung des Ego-Fahrzeugs auf der unteren Linksabbiegerspur. 


Als grundlegendes Framework wird im Forschungsfahrzeug das Robot Ope- 
rating System 2 (ROS 2) [13] verwendet. ROS 2 ist ein Open-Source Publis- 
her/Subscriber Framework, das ursprünglich aus der Robotik stammt. Unter 
anderem auch aufgrund seiner Modularität wird ROS 2 inzwischen in vie- 
len Forschungsfahrzeugen eingesetzt. Ein großer Vorteil ist zudem die weite 
Verbreitung von ROS 2, wodurch auf eine Vielzahl von bereits verfügbaren 
ROS 2 Paketen zur Datenverarbeitung zurückgegriffen werden kann. Basierend 
auf ROS 2 wurde Autoware [14] entwickelt. Autoware ist ein Open-Source 
Software Stack für das automatisierte Fahren. Teile von Autoware, wie bei- 
spielsweise die Komponenten Sensing, Perception und Localization, werden 
im hier vorgestellten Forschungsfahrzeug verwendet. 


Die Spannungsversorgung der Messtechnik ist komplett vom Bordnetz ge- 
trennt und erfolgt über zwei 12 Volt Batterien mit 120 Ah. 
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3.2 Koordinatensysteme 


Dieser Abschnitt beschreibt die Koordinatensysteme des Messfahrzeugs. Alle 
Koordinatensysteme sind rechtshändig. Für jeden Sensor ist mindestens ein ei- 
genes Sensorkoordinatensystem definiert. Zudem existieren mehrere fahrzeug- 
feste Koordinatensysteme, bei denen die x-Richtung jeweils in Fahrtrichtung 
nach vorne zeigt: base_link (Mitte der Hinterachse auf den Boden projiziert), 
front_axle_center_link (Mitte der Vorderachse), rear_axle_center_link (Mitte 
der Hinterachse) sowie front_bumper (Vorderster mittiger Punkt des Fahrzeugs 
auf den Boden projiziert). 


utp J 


Rr 


8‘ 


g" 


Bild 4: Koordinatensysteme des Messfahrzeugs. Die x-Richtung ist jeweils rot, 
die y-Richtung grün und die z-Richtung blau dargestellt. 


Bild 4 zeigt die verschiedenen Sensorkoordinatensysteme des Messfahrzeugs. 
Von den fahrzeugeigenen Koordinatensystemen ist in der Abbildung lediglich 
das front_bumper Koordinatensystem (Vrz) dargestellt, da die anderen Koor- 
dinatensysteme innerhalb des Fahrzeugs liegen. 


Die Koordinatensysteme der beiden Lidar-Sensoren werden in der Abbildung 
durch die Lidar-Modelle markiert. Die beiden Koordinatensysteme lidar_os1_- 
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base_link (in Bild 4: Lı) und lidar_os2_base_link (L2) liegen gemäß dem 
Sensoraufbau in der Mitte des Fahrzeugdachs übereinander. Für beide Lidare 
sind weitere Sub-Koordinatensysteme für die jeweilige IMU sowie für den 
Laser definiert. 


Die Einbaupositionen der Kameras sind in Bild 4 durch die roten Boxen und 
die Koordinatensysteme C markiert. Dargestellt sind jeweils die Bildkoordina- 
tensysteme (x-Achse zeigt nach rechts, y-Achse nach unten, z-Achse ins Bild). 
Die Mobileye Frontkamera Cyg ist hinter der Windschutzscheibe verbaut. 


Die Koordinatensysteme R der Radarsensoren liegen an den vier Ecken des 
Fahrzeugs. Die x-Achse zeigt dabei nicht in die Einbaurichtung, die jeweils +/- 
45° zur Fahrtrichtung rotiert ist, sondern ist entsprechend der fahrzeugeigenen 
Koordinatensysteme in Fahrtrichtung nach vorne ausgerichtet. Die Berücksich- 
tung der Einbauwinkel erfolgt intern in den Radarsensoren. 


Zwischen den Koordinatensystemen der Lidare und der Rückkamera liegt ein 
weiteres Koordinatensystem Agwss für die GNSS-Antenne. Das Koordinaten- 
system des RTK-DGNSS/INS ist nicht im Bild verzeichnet, da der ADMA im 
Kofferraum des Fahrzeugs installiert ist und es somit ebenfalls dort liegt. 


3.3 Kalibrierung 


Die intrinsische Kalibrierung der Kameras erfolgte mit dem ROS 2 Paket in- 
trinsic_camera_calibrator aus dem Tier4 CalibrationTools Repository!, das 
intern die Funktionalitäten der OpenCV Bibliothek [15] zur Kalibrierung [16] 
verwendet. Als Kalibrierungsmuster wurde ein Schachbrettmuster mit bekann- 
ten Abmessungen verwendet. 


Die extrinsische Kalibrierung zwischen den einzelnen Kameras und einem 
Lidar wurde mit den Paketen extrinsic_interactive_calibrator und extrinsic_- 
tag_based_calibrator aus demselben Repository durchgeführt. Die genaue Po- 
sition des Lidars relativ zum base_link Koordinatensystem des Fahrzeugs so- 
wie die Position der GNSS-Antenne relativ zum RTK-DGNSS/INS relativ bzw. 


! Tier4 CalibrationTools GitHub Repository 
https: //github.com/tier4/CalibrationTools 
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zum base_link Koordinatensystem wurden manuell vermessen. Die Transfor- 
mationen zwischen den verschiedenen fahrzeugfesten Koordinatensystemen 
(Mitte der Vorderachse, Mitte der Hinterachse, Mitte des vorderen Stoßfängers, 
base_link) wurden aus technischen Datenblättern bzw. einem CAD-Modell des 
Nissan Leaf ZEO abgeleitet. 


Die intrinsische und extrinsische Kalibrierung der vier Radarsensoren in Bezug 
auf die Mitte der Vorderachse des Fahrzeugs und die Kalibrierung der Mo- 
bileye Frontkamera in Bezug auf den auf den Boden projizierten vordersten 
Punkt des Fahrzeugs (front_bumper) wurden von dem jeweiligen Hersteller 
bzw. Lieferanten durchgeführt. 


4 Datensatz 


Seit August 2022 fährt das Messfahrzeug ca. einmal wöchentlich eine definier- 
te Messstrecke in der Region Südwestfalen zur Datenaufnahme ab. Die Daten 
werden aktuell aufbereitet und annotiert. Geplant ist die Veröffentlichung des 
Datensatzes im Laufe des Jahres 2024. In den folgenden Abschnitten werden 
die Messtrecke und die für den Datensatz geplanten Annotierungen vorge- 
stellt. 


4.1 Messstrecke 


Die Messstrecke umfasst mit Wohn-, Industrie- und Waldgebieten sowie Feld- 
wegen und Bundesstraßen verschiedene für den ländlichen Raum charakte- 
ristische Abschnitte. Bei der Auswahl der Messstrecke wurde darauf geach- 
tet, dass die Strecke eine Vielzahl unterschiedlicher Szenarien abbildet. Ein 
weiteres wichtiges Kriterium für die Messstrecke war ein durchgehend guter 
Mobilfunkempfang entlang der Route, um die NTRIP Korrekturdaten für das 
RTK-DGNSS/INS zu empfangen. 


Die Länge der Messstrecke beträgt insgesamt ca. 23 km. Bei üblichem Ver- 
kehrsaufkommen liegt die Fahrzeit bei ca. 35 Minuten. Lokalisiert ist die Mess- 
strecke zwischen den Mendener Ortsteilen Bösperde und Halingen sowie dem 
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Iserlohner Ortsteil Simmern. Bild 5 zeigt den Verlauf der Messstrecke anhand 
einer RTK-DGNSS/INS Messung. Im dargestellten Streckenverlauf farblich 
kodiert ist die vom RTK-DGNSS/INS geschätzte Standardabweichung der Po- 
sition in Metern. Kleinere Abweichungen kommen vor allem durch die einge- 
schränkte Sicht zum Himmel im Bereich eines Waldes vor. Auf den übrigen 
Streckenabschnitten liegt die Standardabweichung der Lokalisierung im Be- 
reich weniger Zentimeter. 


Bild 5: Route der Messstrecke. Die Farbskala gibt die Standardabweichung der 
vom RTK-DGNSS/INS bestimmten Position in Metern an. 


4.2 Annotierungen 


Der eigentliche Wert von Datensätzen für das automatisierte Fahren beruht 
auf der Verfügbarkeit von annotierten Ground-Truth-Daten. Für den Datensatz 
geplant sind Referenzdaten für die Ego-Lokalisierung, Objekte und Wetterda- 
ten. Die meist zentimetergenaue Referenzlokalisierung wird dabei inklusive 
Standardabweichung direkt durch das RTK-DGNSS/INS aufgezeichnet. 


Die Objektannotationen erfolgen mithilfe eines Offline-Auto-Labelling-Tools 
auf Basis der Lidar-Punktwolke mit anschließender manueller Validierung. Die 
Offline-Annotation besitzt den Vorteil, dass zukünftige Sensordaten bekannt 
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sind und in die Entscheidungsfindungen mit einbezogen werden. Durch mehr- 
faches alternierendes zeitliches Forward- und Backward-Tracking können so- 
mit auch Objekte verlässlich annotiert werden, die in einigen Zeitschritten nur 
wenige oder keine Messpunkte aufweisen. Neben der Detektionsgüte können 
auch die Abmessungen der Bounding Boxen von detektierten Objekten auf die- 
se Weise verbessert werden. In Bild 6 werden beispielhaft die Ergebnisse der 
automatisierten Annotierung dargestellt. Das Ego-Fahrzeug ist blau dargestellt, 
andere Fahrzeuge sind grün, Fußgänger sind gelb und unbekannte statische 
Objekte sind orange. 


Im Bild erkennbar sind auch teilweise Fehlklassifikationen und zu kleine Boun- 
ding Boxen für Fahrzeuge. Zum Zeitpunkt der Beitragsverfassung wird noch 
eine Version des Auto-Labelling-Tools verwendet, die für das Einsatzgebiet 
Autobahn entwickelt wurde. In Kürze wird eine neue Version des Tools ver- 
fügbar sein, die universeller einsetzbar ist und somit auch im ländlichen Raum 
bessere Ergebnisse liefert. Kleinere Fehler werden durch eine manuelle Vali- 
dierung der Annotierungen korrigiert. 


Bild 6: Ergebnisse der automatisierten Annotierung. 


Die Wetterdaten werden über öffentlich verfügbare Wetterdatenbanken? bezo- 
gen. 


? 7B. https: //www.timeanddate .de/wetter/deutschland/menden/rueckblick 
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4.3 Aktueller Stand der Datenaufnahme 


Seit August 2022 wurde die Messstrecke befahren. Auf diese Weise wurden 
bisher über 1.100 km an Daten auf der Messstrecke in ca. 30 Stunden aufge- 
nommen. Auf der Messstrecke wurde bisher eine Datenmenge von insgesamt 
über 20 TB aufgezeichnet. 


Neben den Aufnahmen der Messstrecke wurden auch die Hin- und Rückfahrten 
von bzw. zur TU Dortmund aufgezeichnet (städtische Umgebung, Autobahn 
und ländliche Umgebung), wodurch pro Fahrt weitere 70 km an Daten auf- 
genommen wurden. Die gesamte aufgezeichnete Datenmenge liegt somit bei 
über 60 TB bei einer Gesamtstrecke von über 4.500 km. 


Im Laufe der Messfahrten veränderte sich die Sensorkonfiguration leicht. Die 
vier Radarsensoren wurden im Herbst 2022 installiert. Die vier Seitenkameras 
wurden erst zu Beginn des Jahres 2023 in Betrieb genommen. Zudem fiel 
einer der beiden Lidare für mehrere Monate aufgrund eines Wasserschadens 
bei einer Regenfahrt aus und wurde erst kürzlich durch ein neueres Modell 
ersetzt. Dementsprechend liegen die Datenraten bei den letzten Messfahrten 
deutlich höher als bei den ersten Messfahrten. Unter anderem aufgrund dieser 
Umstände werden die Messfahrten noch bis Ende März 2024 fortgeführt, um 
ein komplettes Jahr mit der gleichen Sensorkonfiguration abzudecken. 


5 Zusammenfassung und Ausblick 


In diesem Beitrag wurde das Forschungsfahrzeug der TU Dortmund sowie die 
Vorgehensweise bei der Aufnahme und Aufbereitung eines neuen multimoda- 
len und multisaisonalen Datensatzes aus dem ländlichen Raum und die dafür 
verwendete Messstrecke vorgestellt. Der Datensatz umfasst Lidar-, Kamera- 
und (Automotive-)Radardaten sowie eine hochgenaue Referenzlokalisierung 
durch ein RTK-DGNSS/INS. 


Aufgrund der langfristigen Datenaufnahme stehen für die jeweiligen Strecken- 
abschnitte jeweils eine Vielzahl verschiedener Aufnahmen zu wechselnden 
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Tages- und Jahreszeiten bei unterschiedlichen Wetterbedingungen zur Verfü- 
gung. Auf diese Weise kann der Einfluss der Witterungsverhältnisse auf die 
Messungen unterschiedlicher Sensoren und auf Algorithmen der Umfelderfas- 
sung und Ego-Lokalisierung systematisch untersucht werden. 


Der Datensatz wird aktuell für die Veröffentlichung aufbereitet und im Laufe 
des Jahres 2024 veröffentlicht. 
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Z 


Dieser Tagungsband enthält die Beiträge des 33. Workshops „Computational Intelligence‘ 
der vom 23.11. - 24.11.2023 in Berlin stattfindet. 


Die Schwerpunkte sind Methoden, Anwendungen und Tools für 

o Fuzzy-Systeme, 

o Künstliche Neuronale Netze, 

o Evolutionare Algorithmen und 

o Data-Mining-Verfahren 

sowie der Methodenvergleich anhand von industriellen und Benchmark-Problemen. 


Die Ergebnisse werden von Teilnehmern aus Hochschulen, Forschungseinrichtungen und 
der Industrie in einer offenen Atmosphäre intensiv diskutiert. Dabei ist es gute Tradition, 
auch neue Ansätze und Ideen bereits in einem frühen Entwicklungsstadium vorzustellen, 
in dem sie noch nicht vollständig ausgereift sind. 
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